Lab 04 — Optical Flow & Object Tracking
Phase 1: Classical Computer Vision | Week 3-4
Learn how images change over time — the foundation of video understanding, autonomous driving, and surveillance systems.
Learning Objectives
- Derive and implement Lucas-Kanade optical flow from the brightness constancy constraint
- Understand dense vs sparse optical flow and when to use each
- Implement a simple tracker using Kalman filtering concepts
- Know the math behind why optical flow fails at edges (aperture problem)
Theory
Optical Flow — Core Equation
Brightness constancy assumption: $$I(x, y, t) = I(x + dx, y + dy, t + dt)$$
Taylor expand the right side: $$I(x,y,t) + I_x u + I_y v + I_t = I(x,y,t)$$
$$\Rightarrow I_x u + I_y v + I_t = 0$$
where $u = dx/dt$, $v = dy/dt$ are the flow vectors. This is one equation, two unknowns — the aperture problem.
Lucas-Kanade (Sparse, Local)
Assume flow is constant within a window $W$ of pixels:
$$\begin{bmatrix} I_{x1} & I_{y1} \ \vdots & \vdots \ I_{xN} & I_{yN} \end{bmatrix} \begin{bmatrix} u \ v \end{bmatrix} = -\begin{bmatrix} I_{t1} \ \vdots \ I_{tN} \end{bmatrix}$$
Least-squares solution:
$$\mathbf{A}^T\mathbf{A} \begin{bmatrix} u \ v \end{bmatrix} = \mathbf{A}^T \mathbf{b}$$
Note: $\mathbf{A}^T\mathbf{A}$ is exactly the Harris matrix — LK fails on edges (one eigenvalue ≈ 0) and is well-defined only at corners.
Farneback Dense Optical Flow
Polynomial expansion of each neighborhood, then match polynomials. Produces a flow vector for every pixel. Slower but more complete than LK.
Gunnar-Farneback vs Lucas-Kanade Comparison
| Method | Type | Speed | Accuracy | Use Case |
|---|---|---|---|---|
| Lucas-Kanade | Sparse | Fast | High on corners | Track specific features |
| Farneback | Dense | Medium | Medium everywhere | Full motion analysis |
| DeepFlow/RAFT | Dense DL | Slow (GPU) | Best | Production video |
What the Lab Covers
| Function | Concept | Complexity |
|---|---|---|
create_synthetic_video() | Controlled ground-truth motion | - |
lucas_kanade_demo() | Sparse LK with goodFeaturesToTrack | Medium |
farneback_dense_demo() | Dense flow + HSV visualization | Medium |
optical_flow_magnitude_demo() | Motion heatmap, background subtraction | Easy |
multi_scale_pyramid_demo() | Pyramid LK for large motion | Hard |
Key OpenCV Functions
# Detect corners to track
pts = cv2.goodFeaturesToTrack(gray, maxCorners=100, qualityLevel=0.3, minDistance=7)
# Sparse LK optical flow
next_pts, status, err = cv2.calcOpticalFlowPyrLK(prev_gray, next_gray, pts, None,
winSize=(15,15), maxLevel=3)
# Dense Farneback flow
flow = cv2.calcOpticalFlowFarneback(prev, next, None,
pyr_scale=0.5, levels=3, winsize=15,
iterations=3, poly_n=5, poly_sigma=1.1, flags=0)
# flow.shape: (H, W, 2) — (dx, dy) per pixel
Interview Questions
Q: What is the aperture problem? A: At a single pixel, you can only measure the component of flow perpendicular to the local edge direction. Without a window constraint (LK) or regularization (Horn-Schunck), the problem is underdetermined.
Q: Why does optical flow break down at occlusion boundaries? A: The brightness constancy assumption fails when a pixel in frame $t$ corresponds to a different surface in frame $t+1$ (occlusion). The Taylor expansion is also invalid for large displacements — hence pyramid schemes.
Q: How does RAFT (2020) improve over classical methods? A: RAFT iteratively updates a dense flow field using a correlation volume (4D cost volume over all displacement combinations) and a recurrent GRU update operator. It handles large displacements without a fixed scale pyramid.
Q: How is optical flow used in action recognition? A: Two-stream networks: one stream on RGB frames, one stream on stacked optical flow fields. The flow stream provides motion cues that appearance alone can't capture.
Run
pip install -r requirements.txt
python solution.py
# Outputs saved to outputs/