Lab 05 — Camera Calibration & Pose Estimation

Phase 1: Classical Computer Vision | Week 4

Every computer vision system that interacts with the physical world — robots, AR, autonomous vehicles — needs to know its camera model. This lab teaches you to calibrate cameras and estimate 3D pose.


Learning Objectives

  • Understand the pinhole camera model and projection equations
  • Perform camera calibration using the Zhang method (chessboard)
  • Estimate rotation/translation (PnP problem)
  • Understand lens distortion and how to correct it
  • Apply reprojection error to evaluate calibration quality

Theory

Pinhole Camera Model

A 3D point $\mathbf{P}_W = [X, Y, Z]^T$ in world coordinates projects to pixel $\mathbf{p} = [u, v]^T$:

$$\begin{bmatrix} u \ v \ 1 \end{bmatrix} = \frac{1}{Z} \underbrace{\begin{bmatrix} f_x & 0 & c_x \ 0 & f_y & c_y \ 0 & 0 & 1 \end{bmatrix}}{\mathbf{K}} \underbrace{\begin{bmatrix} R & \mathbf{t} \end{bmatrix}}{[\text{R}|\mathbf{t}]} \begin{bmatrix} X \ Y \ Z \ 1 \end{bmatrix}$$

  • $f_x, f_y$: focal lengths in pixels
  • $c_x, c_y$: principal point (usually near image center)
  • $[R|\mathbf{t}]$: extrinsic matrix (camera pose)
  • $\mathbf{K}$: camera intrinsic matrix

Lens Distortion Model

Real lenses introduce radial and tangential distortion. For a normalized point $(x_n, y_n)$:

$$r^2 = x_n^2 + y_n^2$$

$$x_d = x_n(1 + k_1 r^2 + k_2 r^4 + k_3 r^6) + 2p_1 x_n y_n + p_2(r^2 + 2x_n^2)$$

Coefficients $(k_1, k_2, p_1, p_2, k_3)$ are estimated during calibration.

Zhang's Calibration Method (1998)

  1. Observe a planar pattern (chessboard) from $\geq 3$ different orientations
  2. Compute homography $H_i$ between pattern plane and image for each view
  3. Each homography gives 2 constraints on $\mathbf{K}$
  4. With $\geq 3$ views: solve for $\mathbf{K}$, then refine all parameters via Levenberg-Marquardt

Reprojection error (lower = better, < 0.5px is excellent): $$\text{err} = \frac{1}{N}\sum_i |\mathbf{p}_i - \hat{\mathbf{p}}_i(\mathbf{K}, \mathbf{d}, R_i, \mathbf{t}_i)|_2$$

PnP Problem (Perspective-n-Point)

Given $n \geq 4$ 2D-3D correspondences, solve for camera pose $[R|\mathbf{t}]$:

  • EPnP (default, O(N)): efficient closed-form via virtual control points
  • RANSAC + PnP: handles outliers for robust pose estimation in the wild

What the Lab Covers

FunctionConcept
synthesize_chessboard_views()Generate calibration data with known ground truth
calibrate_camera_opencv()Zhang method via cv2.calibrateCamera()
evaluate_reprojection_error()Per-view error, residual histogram
undistort_demo()Apply distortion correction
pose_estimation_pnp()Estimate 6-DOF pose with RANSAC
draw_axes_on_frame()Visualize 3D coordinate frame projected onto image

Key OpenCV Functions

# Find chessboard corners
ret, corners = cv2.findChessboardCorners(gray, board_size)
# Sub-pixel refinement
corners = cv2.cornerSubPix(gray, corners, (11,11), (-1,-1), criteria)

# Calibrate
rms, K, dist, rvecs, tvecs = cv2.calibrateCamera(
    obj_points, img_points, img_size, None, None)

# Undistort
undistorted = cv2.undistort(frame, K, dist)
# Or compute map once (faster for video)
map1, map2 = cv2.initUndistortRectifyMap(K, dist, None, K, size, cv2.CV_32FC1)
dst = cv2.remap(frame, map1, map2, cv2.INTER_LINEAR)

# PnP pose estimation
success, rvec, tvec, inliers = cv2.solvePnPRansac(
    obj_pts, img_pts, K, dist,
    iterationsCount=100, reprojectionError=8.0)

Interview Questions

Q: How many chessboard views do you need for calibration? Why? A: Minimum 3 (each gives 2 constraints on K's 5 DOF), but 15-30 views are used in practice. More views give better statistical averaging and cover diverse angles needed to separate intrinsics from extrinsics.

Q: What's the difference between intrinsic and extrinsic parameters? A: Intrinsic ($\mathbf{K}$, $\mathbf{d}$) are properties of the camera itself — fixed for a given camera/lens. Extrinsic ($R$, $\mathbf{t}$) define the camera's pose in the world — changes each frame.

Q: What does reprojection error tell you? A: The RMS pixel distance between observed 2D points and the same 3D points projected through the estimated camera model. < 0.5px is excellent; > 2px suggests bad views or wrong corner detection.

Q: How would you calibrate a stereo camera system? A: Calibrate each camera independently first, then use cv2.stereoCalibrate() to jointly optimize and find the relative pose $R, T$ between cameras. cv2.stereoRectify() then aligns epipolar lines to be horizontal for efficient matching.


Run

pip install -r requirements.txt
python solution.py
# Outputs saved to outputs/