Lab 02 — NumPy & Matplotlib for Computer Vision

Phase: 0 — Foundations | Difficulty: ⭐⭐⭐☆☆
Files: lab.py, solution.py, exploration.ipynb

Every image processing operation in computer vision reduces to tensor arithmetic. OpenCV, PyTorch, TensorFlow, and scikit-image all represent images as NumPy arrays (or wrappers around them). Understanding NumPy deeply means you can debug shape mismatches, write efficient preprocessing, and avoid silent numerical bugs.

Image shape conventions:

Library	Shape	Channel order	Dtype
OpenCV	`(H, W, C)`	BGR	uint8
PyTorch	`(C, H, W)`	RGB	float32 [0,1]
TensorFlow/Keras	`(H, W, C)`	RGB	float32 [0,1]
Matplotlib	`(H, W, C)`	RGB	uint8 or float32

Converting between formats is a constant task:

# OpenCV BGR → PyTorch tensor (C, H, W) float32
import cv2, numpy as np
img_bgr = cv2.imread("image.jpg")          # (H, W, 3) uint8 BGR
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)  # RGB
img_f32 = img_rgb.astype(np.float32) / 255.0         # [0, 1]
tensor = img_f32.transpose(2, 0, 1)                  # (C, H, W)
# Or: np.moveaxis(img_f32, -1, 0)

Broadcasting

Broadcasting is NumPy's rule for performing operations on arrays with different shapes. The rule is:

Two dimensions are compatible if they are equal, or one of them is 1.

Dimensions are compared element-wise from the right (trailing dimensions).

Shape A:  (H, W, 3)
Shape B:       (3,)   ← treated as (1, 1, 3)
Result:   (H, W, 3)   ← each pixel's 3 channels scaled by B

Real-world example — channel-wise normalization:

mean = np.array([0.485, 0.456, 0.406])  # shape (3,) — ImageNet mean
std  = np.array([0.229, 0.224, 0.225])  # shape (3,)

image_normalized = (image_f32 - mean) / std  
# image_f32: (H, W, 3), mean: (3,) → broadcasts to (H, W, 3) ✓

Broadcasting rules step-by-step:

If shapes have different number of dimensions, prepend 1s to the smaller shape.
Dimensions must be equal or one must be 1.
Size-1 dimensions are "stretched" to match the other array.

Memory Layout: C-contiguous vs Fortran-contiguous

Why this matters: CUDA kernels, ONNX runtimes, and C extensions expect contiguous arrays. Non-contiguous arrays (from slicing or transpose) can silently cause performance degradation or errors.

# C-contiguous (row-major): default, elements stored row by row
img = np.zeros((480, 640, 3), dtype=np.uint8)  # C-contiguous
img.strides  # (640*3, 3, 1) = (1920, 3, 1) bytes

# After transpose: (3, 480, 640) — no longer C-contiguous!
t = img.transpose(2, 0, 1)
t.flags['C_CONTIGUOUS']  # False!

# Fix: make contiguous copy
t_contiguous = np.ascontiguousarray(t)
# Or: t.copy()

Strides: A stride is the number of bytes to step in a given dimension.

Array shape (4, 3) of int32 (4 bytes):
  Row stride:    3 * 4 = 12 bytes  (step 12 bytes to move to next row)
  Column stride: 1 * 4 = 4 bytes   (step 4 bytes to move to next col)

Understanding strides enables zero-copy operations like np.lib.stride_tricks.sliding_window_view.

Fancy Indexing & Boolean Masking

These are the backbone of ROI extraction, masking, and conditional image editing:

# Boolean masking: select all red-ish pixels (HSV space)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
mask = (hsv[:,:,0] > 0) & (hsv[:,:,0] < 30) & (hsv[:,:,1] > 100)
red_pixels = img[mask]      # shape: (N, 3) — flattened selected pixels
img[mask] = [0, 255, 0]    # paint them green

# Advanced indexing: batch ROI extraction
rois = img[y1:y2, x1:x2]   # slice (view, not copy)
boxes = np.array([[0,0,50,50],[100,100,200,200]])  # (N, 4)
# For multiple ROIs, iterate or use torchvision.ops.roi_align

Key NumPy Functions for CV

Function	Use Case
`np.clip`	Prevent overflow after arithmetic (e.g., after adding noise)
`np.pad`	Add padding before convolution
`np.roll`	Circular shift (useful for augmentation)
`np.einsum`	Efficient batched dot products, attention scores
`np.linalg.svd`	PCA, image compression, denoising
`np.fft.fft2`	Frequency domain analysis, filtering
`np.lib.stride_tricks.sliding_window_view`	Efficient convolution preprocessing

Matplotlib for CV Visualization

import matplotlib.pyplot as plt

# Show image correctly (OpenCV is BGR, matplotlib expects RGB)
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].imshow(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
axes[0].set_title("Original")

# Show grayscale
axes[1].imshow(gray, cmap='gray')

# Show heatmap / attention map
axes[2].imshow(heatmap, cmap='jet', alpha=0.5)

# Draw bounding boxes
import matplotlib.patches as patches
rect = patches.Rectangle((x1,y1), x2-x1, y2-y1,
                          linewidth=2, edgecolor='red', facecolor='none')
axes[0].add_patch(rect)
plt.tight_layout()
plt.savefig("output.png", dpi=150, bbox_inches='tight')

SVD and PCA for Images

Singular Value Decomposition (SVD) of a matrix M: $$M = U \Sigma V^T$$ where:

$U$ — left singular vectors (shape $m \times m$, orthonormal)
$\Sigma$ — diagonal matrix of singular values (sorted descending)
$V^T$ — right singular vectors (shape $n \times n$, orthonormal)

For a grayscale image M of shape $(H, W)$, the rank-$k$ approximation retains only the top $k$ singular values: $$M_k = \sum_{i=1}^{k} \sigma_i u_i v_i^T$$

This is image compression. The compressed image uses $k(H + W + 1)$ numbers instead of $H \times W$.

CV Applications of SVD:

PCA for face recognition (Eigenfaces)
Background subtraction (low-rank + sparse decomposition)
Denoising (truncate small singular values = noise)
Essential/Fundamental matrix computation in stereo vision

Interview Questions

Q: What is the difference between np.copy() and np.view()? Why does this matter in ML pipelines?
A: .copy() allocates new memory. A view (from slicing or reshape when possible) shares memory with the original array — modifying the view modifies the original. This matters because in-place augmentations on views can corrupt original data if you're not careful. Always call .copy() when you intend to modify a subset of a batch.

Q: A model expects input shape (N, C, H, W) float32 in [0, 1]. You receive a batch of OpenCV images (list of (H, W, 3) uint8 BGR). Write the conversion.

batch = np.stack([
    cv2.cvtColor(img, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
    for img in images
])  # (N, H, W, C)
batch = batch.transpose(0, 3, 1, 2)  # (N, C, H, W)
# Or: np.moveaxis(batch, -1, 1)

Q: Why should you call np.ascontiguousarray() before passing to a C extension or CUDA kernel?
A: Non-contiguous arrays (e.g., after transpose) have irregular strides. C/CUDA code assumes row-major contiguous layout. Passing a non-contiguous array silently produces wrong results or causes a segfault. np.ascontiguousarray() creates a contiguous copy only if needed (no-op for already-contiguous arrays).

Pandas for CV Data Management

Pandas is the standard tool for managing datasets, annotations, experiment results, and metrics in CV pipelines. You will use it constantly in real projects.

Why Pandas in CV?

Load and filter annotation CSVs (COCO, Open Images, custom)
Track per-image / per-class metrics across experiments
Join predictions with ground truth for error analysis
Export benchmark results for reporting

Core Operations

import pandas as pd
import numpy as np

# ── Loading annotation files ──────────────────────────────────────────────────
# Many datasets ship as CSV or can be converted to one
df = pd.read_csv("annotations.csv")
# Common columns: image_id, class_name, xmin, ymin, xmax, ymax, confidence

# ── Exploring the dataset ─────────────────────────────────────────────────────
print(df.shape)           # (N, cols)
print(df.dtypes)          # column types
print(df.head())          # first 5 rows
print(df["class"].value_counts())  # class distribution

# ── Filtering ─────────────────────────────────────────────────────────────────
# Only high-confidence detections
high_conf = df[df["confidence"] > 0.5]
# Specific classes
persons = df[df["class"] == "person"]
# Multiple conditions (use & not 'and')
filtered = df[(df["confidence"] > 0.3) & (df["class"].isin(["car", "truck"]))]

# ── Computing bounding box area ───────────────────────────────────────────────
df["area"] = (df["xmax"] - df["xmin"]) * (df["ymax"] - df["ymin"])
df["aspect_ratio"] = (df["xmax"] - df["xmin"]) / (df["ymax"] - df["ymin"])

# ── Per-class statistics ──────────────────────────────────────────────────────
stats = df.groupby("class").agg(
    count=("image_id", "count"),
    mean_conf=("confidence", "mean"),
    mean_area=("area", "mean"),
)
print(stats)

# ── Per-image metrics ─────────────────────────────────────────────────────────
per_image = df.groupby("image_id").agg(
    n_objects=("class", "count"),
    classes=("class", lambda x: list(x.unique())),
)

# ── Joining predictions with ground truth ────────────────────────────────────
preds_df = pd.read_csv("predictions.csv")  # image_id, class, confidence, bbox...
gt_df    = pd.read_csv("ground_truth.csv")

merged = pd.merge(preds_df, gt_df, on="image_id", suffixes=("_pred", "_gt"))

# ── Saving results ────────────────────────────────────────────────────────────
metrics = pd.DataFrame([
    {"model": "YOLOv8", "mAP@50": 0.723, "latency_ms": 12.1},
    {"model": "DETR",   "mAP@50": 0.748, "latency_ms": 38.4},
    {"model": "FCOS",   "mAP@50": 0.701, "latency_ms": 18.7},
])
metrics.to_csv("outputs/experiment_results.csv", index=False)
print(metrics.to_string(index=False))

Pandas + NumPy Bridge

# Convert DataFrame column to NumPy array for math
scores = df["confidence"].to_numpy()                # 1D array
boxes  = df[["xmin", "ymin", "xmax", "ymax"]].to_numpy()  # (N, 4) array

# Convert NumPy results back to DataFrame
iou_matrix = compute_iou(boxes_pred, boxes_gt)     # (M, N) array
iou_df = pd.DataFrame(iou_matrix, columns=gt_ids, index=pred_ids)

Typical CV Evaluation Workflow with Pandas

# After running inference on a validation set:
results = []
for image_id, pred_boxes, pred_scores, pred_classes, gt_boxes, gt_classes in val_results:
    for box, score, cls in zip(pred_boxes, pred_scores, pred_classes):
        results.append({
            "image_id": image_id,
            "class": cls,
            "confidence": score,
            "xmin": box[0], "ymin": box[1], "xmax": box[2], "ymax": box[3],
        })

df = pd.DataFrame(results)
# Sort by confidence descending (needed for mAP calculation)
df = df.sort_values("confidence", ascending=False)

# Per-class AP
for cls in df["class"].unique():
    cls_df = df[df["class"] == cls]
    # compute precision/recall curve, integrate for AP

Interview Questions

Q: You have a CSV with 1M detection predictions and you need the top-100 highest confidence detections per class. How do you do it efficiently in pandas?

top100 = (
    df.sort_values("confidence", ascending=False)
      .groupby("class")
      .head(100)
      .reset_index(drop=True)
)

Q: How do you find images in your dataset that have no annotations (hard negatives)?

all_image_ids = pd.read_csv("images.csv")["image_id"]
annotated_ids = df["image_id"].unique()
hard_negatives = all_image_ids[~all_image_ids.isin(annotated_ids)]

Q: What is the difference between loc and iloc?
A: loc selects by label (index name or column name); iloc selects by integer position (0-based row/column index). Use iloc when you need positional slicing, loc when filtering by value or named index.

AI Engineer — Role-Based Learning Hub

Lab 02 — NumPy & Matplotlib for Computer Vision

Concept Overview

Images as NumPy Arrays

Broadcasting

Memory Layout: C-contiguous vs Fortran-contiguous

Fancy Indexing & Boolean Masking

Key NumPy Functions for CV

Matplotlib for CV Visualization

SVD and PCA for Images

Interview Questions

Pandas for CV Data Management

Why Pandas in CV?

Core Operations

Pandas + NumPy Bridge

Typical CV Evaluation Workflow with Pandas

Interview Questions