Capstone 01: Real-Time Object Detection Pipeline
Project Goal
Build an end-to-end real-time object detection system:
- Synthetic video frame generator (no camera required)
- YOLOv8-style detection model (lightweight custom CNN)
- FastAPI inference server with dynamic batching
- Latency + throughput benchmark dashboard
Architecture
Synthetic Frame Generator
│ (30 fps, 640×480)
▼
Preprocessing Service ← resize, normalize
│
▼
Detection Model ← anchor-free FCOS-style head
│ (bounding boxes + class scores)
▼
NMS Post-processing ← torchvision.ops.nms
│
▼
FastAPI /predict endpoint ← JSON response
│
▼
Performance Dashboard ← matplotlib saved plots
Key Metrics to Report
- Model parameters: < 5M (fast enough for real-time)
- p50 inference latency: target < 30ms (CPU), < 5ms (GPU)
- Throughput: target > 30 fps (CPU), > 200 fps (GPU)
- mAP@0.5 on synthetic dataset
What You Learn
- Anchor-free detection head (FCOS-style) vs anchor-based (YOLOv5-style)
- Non-maximum suppression implementation from scratch
- End-to-end pipeline integration
- Performance profiling with
torch.profiler