Phase 05: Computer Vision Deep Learning

Object detection, segmentation, and modern architectures — the heart of the CV engineer role.

Labs

LabTopicKey Papers
lab-01YOLOv8 — training, evaluation, TensorRT exportUltralytics YOLOv8 (2023)
lab-02Faster R-CNN — two-stage detection from scratchRen et al., 2015
lab-03U-Net — semantic segmentationRonneberger et al., 2015
lab-04Mask R-CNN — instance segmentationHe et al., 2017

Prerequisites

  • Phase 3 complete (PyTorch training loops, ResNet)
  • Phase 4 recommended (TensorFlow/Keras) but not required

Learning Path

  1. Start with YOLOv8 (lab-01) — get end-to-end detection working fast
  2. Study Faster R-CNN theory (lab-02) — understand two-stage detectors deeply
  3. U-Net (lab-03) — most important for medical/industrial CV
  4. Mask R-CNN (lab-04) — combines detection + segmentation

Hardware Requirements

  • GPU strongly recommended (8GB+ VRAM)
  • CPU fallback works but will be 20-50× slower for labs 02-04