Lab 05 — Camera Calibration & Pose Estimation
Phase 1: Classical Computer Vision | Week 4
Every computer vision system that interacts with the physical world — robots, AR, autonomous vehicles — needs to know its camera model. This lab teaches you to calibrate cameras and estimate 3D pose.
Learning Objectives
- Understand the pinhole camera model and projection equations
- Perform camera calibration using the Zhang method (chessboard)
- Estimate rotation/translation (PnP problem)
- Understand lens distortion and how to correct it
- Apply reprojection error to evaluate calibration quality
Theory
Pinhole Camera Model
A 3D point $\mathbf{P}_W = [X, Y, Z]^T$ in world coordinates projects to pixel $\mathbf{p} = [u, v]^T$:
$$\begin{bmatrix} u \ v \ 1 \end{bmatrix} = \frac{1}{Z} \underbrace{\begin{bmatrix} f_x & 0 & c_x \ 0 & f_y & c_y \ 0 & 0 & 1 \end{bmatrix}}{\mathbf{K}} \underbrace{\begin{bmatrix} R & \mathbf{t} \end{bmatrix}}{[\text{R}|\mathbf{t}]} \begin{bmatrix} X \ Y \ Z \ 1 \end{bmatrix}$$
- $f_x, f_y$: focal lengths in pixels
- $c_x, c_y$: principal point (usually near image center)
- $[R|\mathbf{t}]$: extrinsic matrix (camera pose)
- $\mathbf{K}$: camera intrinsic matrix
Lens Distortion Model
Real lenses introduce radial and tangential distortion. For a normalized point $(x_n, y_n)$:
$$r^2 = x_n^2 + y_n^2$$
$$x_d = x_n(1 + k_1 r^2 + k_2 r^4 + k_3 r^6) + 2p_1 x_n y_n + p_2(r^2 + 2x_n^2)$$
Coefficients $(k_1, k_2, p_1, p_2, k_3)$ are estimated during calibration.
Zhang's Calibration Method (1998)
- Observe a planar pattern (chessboard) from $\geq 3$ different orientations
- Compute homography $H_i$ between pattern plane and image for each view
- Each homography gives 2 constraints on $\mathbf{K}$
- With $\geq 3$ views: solve for $\mathbf{K}$, then refine all parameters via Levenberg-Marquardt
Reprojection error (lower = better, < 0.5px is excellent): $$\text{err} = \frac{1}{N}\sum_i |\mathbf{p}_i - \hat{\mathbf{p}}_i(\mathbf{K}, \mathbf{d}, R_i, \mathbf{t}_i)|_2$$
PnP Problem (Perspective-n-Point)
Given $n \geq 4$ 2D-3D correspondences, solve for camera pose $[R|\mathbf{t}]$:
- EPnP (default, O(N)): efficient closed-form via virtual control points
- RANSAC + PnP: handles outliers for robust pose estimation in the wild
What the Lab Covers
| Function | Concept |
|---|---|
synthesize_chessboard_views() | Generate calibration data with known ground truth |
calibrate_camera_opencv() | Zhang method via cv2.calibrateCamera() |
evaluate_reprojection_error() | Per-view error, residual histogram |
undistort_demo() | Apply distortion correction |
pose_estimation_pnp() | Estimate 6-DOF pose with RANSAC |
draw_axes_on_frame() | Visualize 3D coordinate frame projected onto image |
Key OpenCV Functions
# Find chessboard corners
ret, corners = cv2.findChessboardCorners(gray, board_size)
# Sub-pixel refinement
corners = cv2.cornerSubPix(gray, corners, (11,11), (-1,-1), criteria)
# Calibrate
rms, K, dist, rvecs, tvecs = cv2.calibrateCamera(
obj_points, img_points, img_size, None, None)
# Undistort
undistorted = cv2.undistort(frame, K, dist)
# Or compute map once (faster for video)
map1, map2 = cv2.initUndistortRectifyMap(K, dist, None, K, size, cv2.CV_32FC1)
dst = cv2.remap(frame, map1, map2, cv2.INTER_LINEAR)
# PnP pose estimation
success, rvec, tvec, inliers = cv2.solvePnPRansac(
obj_pts, img_pts, K, dist,
iterationsCount=100, reprojectionError=8.0)
Interview Questions
Q: How many chessboard views do you need for calibration? Why? A: Minimum 3 (each gives 2 constraints on K's 5 DOF), but 15-30 views are used in practice. More views give better statistical averaging and cover diverse angles needed to separate intrinsics from extrinsics.
Q: What's the difference between intrinsic and extrinsic parameters? A: Intrinsic ($\mathbf{K}$, $\mathbf{d}$) are properties of the camera itself — fixed for a given camera/lens. Extrinsic ($R$, $\mathbf{t}$) define the camera's pose in the world — changes each frame.
Q: What does reprojection error tell you? A: The RMS pixel distance between observed 2D points and the same 3D points projected through the estimated camera model. < 0.5px is excellent; > 2px suggests bad views or wrong corner detection.
Q: How would you calibrate a stereo camera system?
A: Calibrate each camera independently first, then use cv2.stereoCalibrate() to jointly optimize and find the relative pose $R, T$ between cameras. cv2.stereoRectify() then aligns epipolar lines to be horizontal for efficient matching.
Run
pip install -r requirements.txt
python solution.py
# Outputs saved to outputs/