Lab 02 — Spatial Filtering, Edge Detection & Morphology
Phase: 1 — Classical CV | Difficulty: ⭐⭐⭐☆☆
Spatial Filtering (Convolution-based)
A spatial filter (or kernel) modifies each pixel based on its neighborhood. This is the same convolution operation used in CNNs — understanding it classically makes deep learning more intuitive.
Gaussian Blur
$$G(x, y) = \frac{1}{2\pi\sigma^2} e^{-\frac{x^2+y^2}{2\sigma^2}}$$
Properties:
- Separable: $G_{2D} = G_{1D} \otimes G_{1D}^T$ → apply 1D horizontally then vertically (reduces ops from $O(k^2)$ to $O(2k)$ per pixel)
- Isotropic: same blur in all directions
- Removes high-frequency noise (blurs sharp edges too)
Sigma vs kernel size: $\sigma$ controls the spread. Kernel size should be $\geq 6\sigma + 1$ (to capture ~99.7% of the Gaussian). In OpenCV: cv2.GaussianBlur(img, (ksize, ksize), sigma) — if sigma=0, it's inferred from ksize.
Median Filter
Replaces each pixel with the median of its neighborhood. Non-linear — not expressible as a convolution.
Key advantage: Robust to outliers (salt-and-pepper noise). A single extreme pixel value doesn't affect the median. Gaussian blur would smear it.
Disadvantage: Computationally expensive (O(k² log k²) per pixel). Edges are better preserved than Gaussian.
Bilateral Filter
Combines spatial proximity (like Gaussian) with color/intensity similarity:
$$I_{filtered}(x) = \frac{1}{W} \sum_{x' \in \Omega} I(x') \cdot G_s(x'-x) \cdot G_r(I(x')-I(x))$$
- $G_s$: spatial Gaussian (penalizes distant pixels)
- $G_r$: range Gaussian (penalizes pixels with different intensity)
Effect: Smooths flat regions (same intensity) while preserving edges (large intensity difference). Used in portrait photography, HDR tone mapping.
Gradient-based Edge Detection
Image Gradients
The gradient of a continuous image $I$: $$\nabla I = \left(\frac{\partial I}{\partial x}, \frac{\partial I}{\partial y}\right)$$
Discrete approximations (Sobel operators): $$S_x = \begin{bmatrix} -1 & 0 & +1 \ -2 & 0 & +2 \ -1 & 0 & +1 \end{bmatrix}, \quad S_y = S_x^T$$
Gradient magnitude: $||\nabla I|| = \sqrt{G_x^2 + G_y^2}$ (approximated as $|G_x| + |G_y|$ for speed)
Gradient direction: $\theta = \arctan(G_y / G_x)$
Laplacian (second derivative, detects zero-crossings = edges): $$\nabla^2 I = \frac{\partial^2 I}{\partial x^2} + \frac{\partial^2 I}{\partial y^2}$$
LoG (Laplacian of Gaussian): Gaussian blur then Laplacian. Equivalent to the Mexican hat wavelet.
Canny Edge Detection — Step by Step
Canny is the gold standard classical edge detector. Steps:
- Gaussian blur: $I_{\sigma} = G_\sigma * I$ — suppress noise
- Gradient computation: $G_x, G_y$ via Sobel. Compute magnitude $M$ and direction $\theta$
- Non-maximum suppression (NMS): Thin edges. For each pixel, keep it only if it's a local maximum along the gradient direction. Suppresses thick "fat" edges.
- Double thresholding: Classify pixels as:
- Strong edge: $M > T_{high}$
- Weak edge: $T_{low} < M \leq T_{high}$
- Non-edge: $M \leq T_{low}$
- Hysteresis edge tracking: Keep a weak edge pixel only if it's connected to a strong edge pixel (8-connectivity). This retains real edges that have low gradient at some points while eliminating isolated noise pixels.
Aperture parameter: Canny uses a Sobel kernel of size apertureSize (1, 3, 5, or 7). Larger = detects smoother/larger-scale edges, filters out fine detail.
Contour Analysis
After binary segmentation or edge detection, contours are the boundaries of connected foreground regions.
contours, hierarchy = cv2.findContours(
binary_mask,
cv2.RETR_EXTERNAL, # only outer contours (vs RETR_TREE for full hierarchy)
cv2.CHAIN_APPROX_SIMPLE # compress horizontal/vertical runs (saves memory)
)
Contour features:
area = cv2.contourArea(cnt) # pixels
perimeter = cv2.arcLength(cnt, closed=True) # pixels
circularity = 4*np.pi*area / (perimeter**2 + 1e-8) # 1.0 = perfect circle
x,y,w,h = cv2.boundingRect(cnt) # bounding box
(cx,cy), radius = cv2.minEnclosingCircle(cnt) # min enclosing circle
hull = cv2.convexHull(cnt) # convex hull
Shape matching: cv2.matchShapes(cnt1, cnt2, cv2.CONTOURS_MATCH_I1, 0) — uses Hu moments (7 moment invariants that are invariant to translation, rotation, scale, and reflection for some).
Hough Transform
Detects lines and circles by voting in parameter space.
Line detection: Each edge pixel $(x, y)$ votes for all lines passing through it. A line $y = mx + b$ is parameterized as $\rho = x\cos\theta + y\sin\theta$ to avoid infinite slope. Peaks in the $(\rho, \theta)$ accumulator → lines.
lines = cv2.HoughLinesP(edges, rho=1, theta=np.pi/180, threshold=100,
minLineLength=50, maxLineGap=10)
Circle detection: cv2.HoughCircles() votes in $(x_c, y_c, r)$ space.
Interview Questions
Q: Walk me through Canny edge detection step by step.
A: (1) Gaussian blur to reduce noise. (2) Compute x and y gradients via Sobel, then magnitude and direction. (3) Non-maximum suppression: for each pixel, zero it out if it's not the local maximum along its gradient direction — this thins edges from fat blobs to single-pixel lines. (4) Double thresholding: strong edges above high threshold, weak edges between low/high, discard below low. (5) Hysteresis: walk connected components — a weak edge pixel is kept only if connected to a strong edge pixel. The two thresholds are typically set at ratio 1:2 or 1:3 (e.g., 50 and 150).
Q: What is the difference between Gaussian and bilateral filtering? When would you use each?
A: Gaussian blur is a linear filter that weights neighbors by spatial distance only — it always blurs edges. Bilateral filter additionally weights by intensity similarity, so pixels across a sharp edge contribute little (intensity very different), while pixels within a smooth region contribute a lot (intensity similar). Use Gaussian for simple noise removal where edge preservation doesn't matter. Use bilateral for portrait smoothing, medical image preprocessing, or any case where you need to denoise while keeping edges sharp. Bilateral is ~10–100× slower than Gaussian.
Q: What does NMS (non-maximum suppression) do in Canny? Why is it needed?
A: After computing gradient magnitude, edges appear as thick ridges (multiple pixels wide) rather than thin lines. NMS thins edges to single-pixel width by suppressing pixels that are not local maxima along the gradient direction. For each pixel, we look at the two neighbors in the gradient direction and only keep the pixel if it has the highest gradient magnitude. Without NMS, all subsequent thresholding would operate on thick noisy blobs, making precise edge localization impossible.