Phase 05: Computer Vision Deep Learning
Object detection, segmentation, and modern architectures — the heart of the CV engineer role.
Labs
| Lab | Topic | Key Papers |
|---|---|---|
| lab-01 | YOLOv8 — training, evaluation, TensorRT export | Ultralytics YOLOv8 (2023) |
| lab-02 | Faster R-CNN — two-stage detection from scratch | Ren et al., 2015 |
| lab-03 | U-Net — semantic segmentation | Ronneberger et al., 2015 |
| lab-04 | Mask R-CNN — instance segmentation | He et al., 2017 |
Prerequisites
- Phase 3 complete (PyTorch training loops, ResNet)
- Phase 4 recommended (TensorFlow/Keras) but not required
Learning Path
- Start with YOLOv8 (lab-01) — get end-to-end detection working fast
- Study Faster R-CNN theory (lab-02) — understand two-stage detectors deeply
- U-Net (lab-03) — most important for medical/industrial CV
- Mask R-CNN (lab-04) — combines detection + segmentation
Hardware Requirements
- GPU strongly recommended (8GB+ VRAM)
- CPU fallback works but will be 20-50× slower for labs 02-04