Lab 04 — Optical Flow & Object Tracking

Phase 1: Classical Computer Vision | Week 3-4

Learn how images change over time — the foundation of video understanding, autonomous driving, and surveillance systems.

Learning Objectives

Derive and implement Lucas-Kanade optical flow from the brightness constancy constraint
Understand dense vs sparse optical flow and when to use each
Implement a simple tracker using Kalman filtering concepts
Know the math behind why optical flow fails at edges (aperture problem)

Theory

Optical Flow — Core Equation

Brightness constancy assumption: $$I(x, y, t) = I(x + dx, y + dy, t + dt)$$

Taylor expand the right side: $$I(x,y,t) + I_x u + I_y v + I_t = I(x,y,t)$$

$$\Rightarrow I_x u + I_y v + I_t = 0$$

where $u = dx/dt$, $v = dy/dt$ are the flow vectors. This is one equation, two unknowns — the aperture problem.

Lucas-Kanade (Sparse, Local)

Assume flow is constant within a window $W$ of pixels:

$$\begin{bmatrix} I_{x1} & I_{y1} \ \vdots & \vdots \ I_{xN} & I_{yN} \end{bmatrix} \begin{bmatrix} u \ v \end{bmatrix} = -\begin{bmatrix} I_{t1} \ \vdots \ I_{tN} \end{bmatrix}$$

Least-squares solution:

$$\mathbf{A}^T\mathbf{A} \begin{bmatrix} u \ v \end{bmatrix} = \mathbf{A}^T \mathbf{b}$$

Note: $\mathbf{A}^T\mathbf{A}$ is exactly the Harris matrix — LK fails on edges (one eigenvalue ≈ 0) and is well-defined only at corners.

Farneback Dense Optical Flow

Polynomial expansion of each neighborhood, then match polynomials. Produces a flow vector for every pixel. Slower but more complete than LK.

Gunnar-Farneback vs Lucas-Kanade Comparison

Method	Type	Speed	Accuracy	Use Case
Lucas-Kanade	Sparse	Fast	High on corners	Track specific features
Farneback	Dense	Medium	Medium everywhere	Full motion analysis
DeepFlow/RAFT	Dense DL	Slow (GPU)	Best	Production video

What the Lab Covers

Function	Concept	Complexity
`create_synthetic_video()`	Controlled ground-truth motion	-
`lucas_kanade_demo()`	Sparse LK with `goodFeaturesToTrack`	Medium
`farneback_dense_demo()`	Dense flow + HSV visualization	Medium
`optical_flow_magnitude_demo()`	Motion heatmap, background subtraction	Easy
`multi_scale_pyramid_demo()`	Pyramid LK for large motion	Hard

Key OpenCV Functions

# Detect corners to track
pts = cv2.goodFeaturesToTrack(gray, maxCorners=100, qualityLevel=0.3, minDistance=7)

# Sparse LK optical flow
next_pts, status, err = cv2.calcOpticalFlowPyrLK(prev_gray, next_gray, pts, None,
                            winSize=(15,15), maxLevel=3)

# Dense Farneback flow
flow = cv2.calcOpticalFlowFarneback(prev, next, None,
        pyr_scale=0.5, levels=3, winsize=15,
        iterations=3, poly_n=5, poly_sigma=1.1, flags=0)
# flow.shape: (H, W, 2) — (dx, dy) per pixel

Q: What is the aperture problem? A: At a single pixel, you can only measure the component of flow perpendicular to the local edge direction. Without a window constraint (LK) or regularization (Horn-Schunck), the problem is underdetermined.

Q: Why does optical flow break down at occlusion boundaries? A: The brightness constancy assumption fails when a pixel in frame $t$ corresponds to a different surface in frame $t+1$ (occlusion). The Taylor expansion is also invalid for large displacements — hence pyramid schemes.

Q: How does RAFT (2020) improve over classical methods? A: RAFT iteratively updates a dense flow field using a correlation volume (4D cost volume over all displacement combinations) and a recurrent GRU update operator. It handles large displacements without a fixed scale pyramid.

Q: How is optical flow used in action recognition? A: Two-stream networks: one stream on RGB frames, one stream on stacked optical flow fields. The flow stream provides motion cues that appearance alone can't capture.

Run

pip install -r requirements.txt
python solution.py
# Outputs saved to outputs/

AI Engineer — Role-Based Learning Hub