Capstone 01: Real-Time Object Detection Pipeline

Project Goal

Build an end-to-end real-time object detection system:

Synthetic video frame generator (no camera required)
YOLOv8-style detection model (lightweight custom CNN)
FastAPI inference server with dynamic batching
Latency + throughput benchmark dashboard

Architecture

Synthetic Frame Generator
    │ (30 fps, 640×480)
    ▼
Preprocessing Service       ← resize, normalize
    │
    ▼
Detection Model             ← anchor-free FCOS-style head
    │ (bounding boxes + class scores)
    ▼
NMS Post-processing         ← torchvision.ops.nms
    │
    ▼
FastAPI /predict endpoint   ← JSON response
    │
    ▼
Performance Dashboard       ← matplotlib saved plots

Key Metrics to Report

Model parameters: < 5M (fast enough for real-time)
p50 inference latency: target < 30ms (CPU), < 5ms (GPU)
Throughput: target > 30 fps (CPU), > 200 fps (GPU)
mAP@0.5 on synthetic dataset

What You Learn

Anchor-free detection head (FCOS-style) vs anchor-based (YOLOv5-style)
Non-maximum suppression implementation from scratch
End-to-end pipeline integration
Performance profiling with torch.profiler