Lab 05 — Camera Calibration & Pose Estimation

Phase 1: Classical Computer Vision | Week 4

Every computer vision system that interacts with the physical world — robots, AR, autonomous vehicles — needs to know its camera model. This lab teaches you to calibrate cameras and estimate 3D pose.

Learning Objectives

Understand the pinhole camera model and projection equations
Perform camera calibration using the Zhang method (chessboard)
Estimate rotation/translation (PnP problem)
Understand lens distortion and how to correct it
Apply reprojection error to evaluate calibration quality

Theory

Pinhole Camera Model

A 3D point $\mathbf{P}_W = [X, Y, Z]^T$ in world coordinates projects to pixel $\mathbf{p} = [u, v]^T$:

$$\begin{bmatrix} u \ v \ 1 \end{bmatrix} = \frac{1}{Z} \underbrace{\begin{bmatrix} f_x & 0 & c_x \ 0 & f_y & c_y \ 0 & 0 & 1 \end{bmatrix}}{\mathbf{K}} \underbrace{\begin{bmatrix} R & \mathbf{t} \end{bmatrix}}{[\text{R}|\mathbf{t}]} \begin{bmatrix} X \ Y \ Z \ 1 \end{bmatrix}$$

$f_x, f_y$: focal lengths in pixels
$c_x, c_y$: principal point (usually near image center)
$[R|\mathbf{t}]$: extrinsic matrix (camera pose)
$\mathbf{K}$: camera intrinsic matrix

Lens Distortion Model

Real lenses introduce radial and tangential distortion. For a normalized point $(x_n, y_n)$:

$$r^2 = x_n^2 + y_n^2$$

$$x_d = x_n(1 + k_1 r^2 + k_2 r^4 + k_3 r^6) + 2p_1 x_n y_n + p_2(r^2 + 2x_n^2)$$

Coefficients $(k_1, k_2, p_1, p_2, k_3)$ are estimated during calibration.

Zhang's Calibration Method (1998)

Observe a planar pattern (chessboard) from $\geq 3$ different orientations
Compute homography $H_i$ between pattern plane and image for each view
Each homography gives 2 constraints on $\mathbf{K}$
With $\geq 3$ views: solve for $\mathbf{K}$, then refine all parameters via Levenberg-Marquardt

Reprojection error (lower = better, < 0.5px is excellent): $$\text{err} = \frac{1}{N}\sum_i |\mathbf{p}_i - \hat{\mathbf{p}}_i(\mathbf{K}, \mathbf{d}, R_i, \mathbf{t}_i)|_2$$

PnP Problem (Perspective-n-Point)

Given $n \geq 4$ 2D-3D correspondences, solve for camera pose $[R|\mathbf{t}]$:

EPnP (default, O(N)): efficient closed-form via virtual control points
RANSAC + PnP: handles outliers for robust pose estimation in the wild

What the Lab Covers

Function	Concept
`synthesize_chessboard_views()`	Generate calibration data with known ground truth
`calibrate_camera_opencv()`	Zhang method via `cv2.calibrateCamera()`
`evaluate_reprojection_error()`	Per-view error, residual histogram
`undistort_demo()`	Apply distortion correction
`pose_estimation_pnp()`	Estimate 6-DOF pose with RANSAC
`draw_axes_on_frame()`	Visualize 3D coordinate frame projected onto image

Key OpenCV Functions

# Find chessboard corners
ret, corners = cv2.findChessboardCorners(gray, board_size)
# Sub-pixel refinement
corners = cv2.cornerSubPix(gray, corners, (11,11), (-1,-1), criteria)

# Calibrate
rms, K, dist, rvecs, tvecs = cv2.calibrateCamera(
    obj_points, img_points, img_size, None, None)

# Undistort
undistorted = cv2.undistort(frame, K, dist)
# Or compute map once (faster for video)
map1, map2 = cv2.initUndistortRectifyMap(K, dist, None, K, size, cv2.CV_32FC1)
dst = cv2.remap(frame, map1, map2, cv2.INTER_LINEAR)

# PnP pose estimation
success, rvec, tvec, inliers = cv2.solvePnPRansac(
    obj_pts, img_pts, K, dist,
    iterationsCount=100, reprojectionError=8.0)

Interview Questions

Q: How many chessboard views do you need for calibration? Why? A: Minimum 3 (each gives 2 constraints on K's 5 DOF), but 15-30 views are used in practice. More views give better statistical averaging and cover diverse angles needed to separate intrinsics from extrinsics.

Q: What's the difference between intrinsic and extrinsic parameters? A: Intrinsic ($\mathbf{K}$, $\mathbf{d}$) are properties of the camera itself — fixed for a given camera/lens. Extrinsic ($R$, $\mathbf{t}$) define the camera's pose in the world — changes each frame.

Q: What does reprojection error tell you? A: The RMS pixel distance between observed 2D points and the same 3D points projected through the estimated camera model. < 0.5px is excellent; > 2px suggests bad views or wrong corner detection.

Q: How would you calibrate a stereo camera system? A: Calibrate each camera independently first, then use cv2.stereoCalibrate() to jointly optimize and find the relative pose $R, T$ between cameras. cv2.stereoRectify() then aligns epipolar lines to be horizontal for efficient matching.

Run

pip install -r requirements.txt
python solution.py
# Outputs saved to outputs/

AI Engineer — Role-Based Learning Hub