Lab 01 — Image Basics: Color Spaces, Histograms, Pixel Operations

Phase: 1 — Classical CV | Difficulty: ⭐⭐☆☆☆

Color Spaces

An image is a function $I: \mathbb{R}^2 \rightarrow \mathbb{R}^C$ mapping spatial coordinates to color values. Different color spaces parameterize color differently, and each color space is suited to different tasks.

BGR / RGB

The default representation. Each pixel is a 3-tuple $(B, G, R)$ or $(R, G, B)$ in [0, 255].

OpenCV always uses BGR — cv2.imread returns BGR. Convert to RGB before showing with matplotlib or passing to PyTorch models.

HSV (Hue, Saturation, Value)

$$H \in [0, 179], \quad S \in [0, 255], \quad V \in [0, 255]$$

(OpenCV uses H range of 0–179 to fit in uint8; multiply by 2 for 0–360°)

Hue: color (red=0°, green=120°, blue=240°)
Saturation: color purity (0=gray, 255=fully saturated)
Value: brightness

Why HSV matters: Color-based object segmentation is trivially easy in HSV. To detect red objects:

mask = cv2.inRange(hsv, (0, 100, 100), (10, 255, 255))

In RGB, "red" spans a complex 3D region. In HSV, it's a simple range on a single channel.

LAB (Lab*)

L: perceptual lightness (0=black, 100=white)
a: green (−) to red (+) axis
b: blue (−) to yellow (+) axis

Why LAB matters:

Perceptually uniform: Euclidean distance in LAB correlates with human-perceived color difference
Separates luminance from chrominance: The L channel is a pure grayscale image unaffected by color
Used in skin detection: skin tones cluster in a small region of the a*b* plane

YCrCb (Luminance + Chroma)

JPEG and video codecs store images in YCrCb. The Y channel carries most visual information (human eyes are more sensitive to luminance than chroma), enabling chroma subsampling (4:2:0 or 4:2:2) for compression.

Grayscale

$$\text{Gray} = 0.114 \cdot B + 0.587 \cdot G + 0.299 \cdot R$$ (BT.601 standard — not a simple average! Green is weighted most heavily because human eyes are most sensitive to green light.)

Histograms

A histogram counts how many pixels have each intensity value. For an 8-bit image, there are 256 bins.

Uses:

Exposure analysis: a histogram bunched left = underexposed, right = overexposed
Thresholding: Otsu's method finds the optimal threshold by maximizing inter-class variance
Image matching: histogram similarity as a lightweight retrieval metric
Normalization: histogram equalization improves contrast for dark images

Histogram Equalization: Spreads the histogram to cover the full range. The equalization function is the CDF (cumulative distribution function): $$h_{eq}(v) = \text{round}\left(\frac{\text{CDF}(v) - \text{CDF}{min}}{(H \times W) - \text{CDF}{min}} \times 255\right)$$

CLAHE (Contrast Limited Adaptive Histogram Equalization): Applies equalization to small tiles, limiting the amplification of noise. Better than global equalization for images with varied lighting conditions. Widely used in medical imaging.

Morphological Operations

Morphology operates on binary images using a structuring element (SE, similar to a kernel):

Operation	Definition	Use case
Erosion	A pixel survives only if all pixels in SE are foreground	Remove small noise, thin objects
Dilation	A pixel becomes foreground if any pixel in SE is foreground	Fill holes, thicken objects
Opening	Erosion then dilation	Remove small isolated foreground regions (noise)
Closing	Dilation then erosion	Fill small holes in foreground regions
Gradient	Dilation − Erosion	Extract object edges/borders
Top-hat	Image − Opening	Highlight bright details on dark background

Otsu's Thresholding

Finds the optimal global threshold $t^*$ that minimizes intra-class variance (equivalently, maximizes inter-class variance):

$$\sigma_B^2(t) = \omega_0(t)\omega_1(t)[\mu_0(t) - \mu_1(t)]^2$$

where $\omega_0, \omega_1$ are class probabilities and $\mu_0, \mu_1$ are class means.

When to use: Works well when the histogram is bimodal. Fails on unimodal histograms or images with spatially varying illumination (use CLAHE + Otsu or adaptive thresholding instead).

Interview Questions

Q: Why does OpenCV use BGR instead of RGB?
A: Historical artifact — early BGR cameras and the Windows BITMAP format stored channels in BGR order. OpenCV was built in the Windows era and never changed the default. Always convert with cv2.cvtColor(img, cv2.COLOR_BGR2RGB) before displaying with matplotlib or passing to PyTorch/TF models (which expect RGB).

Q: In HSV, why is detecting a color range simpler than in RGB?
A: In RGB, a "red" object under varying illumination spans a 3D ellipsoid in color space. In HSV, illumination changes affect only V (value), and color is captured by H (hue). So the same object under different lighting conditions falls in a narrow H range — you only need to threshold one channel for color, and can use V for illumination invariance. This is why all real-time color tracking systems (robot competitions, simple object trackers) use HSV.

Q: What is CLAHE and when would you use it over regular histogram equalization?
A: CLAHE (Contrast Limited Adaptive HE) divides the image into tiles and equalizes each tile independently. It limits the amplification factor (clip limit) to prevent noise amplification. Use it over global equalization when: (1) the image has spatially varying illumination (faces in shadow, medical X-rays), (2) you need to preserve relative contrast, or (3) the image has large uniform regions that would dominate global HE.

AI Engineer — Role-Based Learning Hub