Capstone 01: Real-Time Object Detection Pipeline

Project Goal

Build an end-to-end real-time object detection system:

  • Synthetic video frame generator (no camera required)
  • YOLOv8-style detection model (lightweight custom CNN)
  • FastAPI inference server with dynamic batching
  • Latency + throughput benchmark dashboard

Architecture

Synthetic Frame Generator
    │ (30 fps, 640×480)
    ▼
Preprocessing Service       ← resize, normalize
    │
    ▼
Detection Model             ← anchor-free FCOS-style head
    │ (bounding boxes + class scores)
    ▼
NMS Post-processing         ← torchvision.ops.nms
    │
    ▼
FastAPI /predict endpoint   ← JSON response
    │
    ▼
Performance Dashboard       ← matplotlib saved plots

Key Metrics to Report

  • Model parameters: < 5M (fast enough for real-time)
  • p50 inference latency: target < 30ms (CPU), < 5ms (GPU)
  • Throughput: target > 30 fps (CPU), > 200 fps (GPU)
  • mAP@0.5 on synthetic dataset

What You Learn

  • Anchor-free detection head (FCOS-style) vs anchor-based (YOLOv5-style)
  • Non-maximum suppression implementation from scratch
  • End-to-end pipeline integration
  • Performance profiling with torch.profiler