Phase 11 — Capstone Projects

Difficulty: ⭐⭐⭐⭐⭐ | Estimated Time: 2–4 weeks per capstone Roles supported: All. The capstone is what hiring managers actually click on.

Capstone Philosophy

A capstone is not another lab. It is a single, polished, public GitHub repo with:

A README that a stranger can understand in 90 seconds
An architecture diagram (Excalidraw / Mermaid / draw.io)
Reproducible benchmarks (numbers, not adjectives)
A "tradeoffs" and "what I'd do next" section
A live demo or screencast where applicable

Pick at least 2 of the 4 capstones below to ship publicly. Pick the ones aligned with your target role.

Capstone 1 — Mini-GPT Pretrained on a Custom Corpus

Target roles: Research Engineer Pretraining, Foundation Model Engineer.

Field	Value
Goal	End-to-end pretraining: your tokenizer → your data pipeline → your transformer → your training loop → your eval.
Pipeline	Data scrape/clean → BPE training → packing → nanoGPT training (≥ 50M params) → eval (perplexity + 2 downstream tasks via Phase 8 harness) → model card.
Hardware	1× A100 for ~~10 GPU-hours (~~$15 on RunPod)
Deliverables	GitHub repo, W&B run, model card, blog post
Resume Bullet	"Pre-trained a 60M-parameter decoder-only transformer end-to-end (custom BPE tokenizer + 4 GB cleaned corpus + FSDP training + Phase 8 eval harness); achieved val perplexity 6.4 in 9 GPU-hours, reproducible from scratch in <$20 of cloud compute."

Capstone 2 — Production RAG with Eval

Target roles: Applied AI Engineer, LLM Inference Engineer.

Field	Value
Goal	A RAG service good enough to put in front of users, with quantified quality.
Pipeline	Real corpus (≥ 5k docs) → chunking → hybrid retrieval (BM25 + dense) → cross-encoder re-ranker → generation with citations → SSE streaming → RAGAS eval → A/B harness comparing retrievers.
Stack	FastAPI, Qdrant, sentence-transformers, BGE-reranker, Llama-3-8B (vLLM) or hosted, RAGAS
Deliverables	Repo + live demo (Gradio / web) + RAGAS scorecard + ablation table
Resume Bullet	"Built a production RAG service (Qdrant + BM25 + RRF + BGE reranker + vLLM-served Llama-3-8B) over a 12k-document corpus, exposed via FastAPI/SSE; quantified quality with RAGAS (faithfulness 0.87, context precision 0.81) and ran 6 documented design ablations."

Capstone 3 — LLM Inference Gateway (the Hire-Magnet for Infra Roles)

Target roles: LLM Inference Engineer, ML Systems Engineer.

Field	Value
Goal	A multi-model inference gateway with all the production features.
Features	(1) Continuous batching, (2) KV-cache + prefix caching, (3) INT4 AWQ quantization, (4) SSE streaming, (5) per-tenant rate limits, (6) OpenTelemetry tracing, (7) Prometheus metrics + Grafana dashboard, (8) admission control under load, (9) graceful drain on shutdown, (10) `/v1/chat/completions` OpenAI-compatible API.
Stack	vLLM under the hood, FastAPI gateway, Redis (rate limit), Prometheus, Grafana, OpenTelemetry, Docker Compose
Benchmark	TTFT P50/P99, TPOT, max sustained tok/s, $/M-tokens — all reported in README
Deliverables	Repo + Docker Compose stack + benchmark report + architecture diagram
Resume Bullet	"Designed and shipped an OpenAI-compatible LLM inference gateway (vLLM core + FastAPI + Redis rate limit + OpenTelemetry tracing + Prometheus/Grafana) achieving sustained 1,420 tok/s at P99 TTFT 230 ms on a single A100; reduced $/M-tokens by 58% vs naive HuggingFace serving."

Capstone 4 — Domain Assistant: SFT + DPO + Eval

Target roles: Post-training Engineer, Production Model Post-Training.

Field	Value
Goal	Take a base 7B → SFT on domain data → DPO on preferences → measurable improvement.
Pipeline	Domain pick (legal, medical, finance, code) → 5k synthetic instruction set (Phase 6 Lab 3) → QLoRA SFT (Phase 6 Lab 2) → 1k preference pairs → DPO (Phase 6 Lab 4) → Phase 8 eval comparing base vs SFT vs SFT+DPO.
Stack	`trl`, `peft`, `bitsandbytes`, your Phase 8 harness
Deliverables	Adapters on HF Hub, eval scorecard, model card with intended use + limitations
Resume Bullet	"Trained a domain assistant (Llama-3-8B QLoRA SFT + DPO) on 5k synthetic instructions and 1k preference pairs; preference-win-rate vs base improved 23% → 71% (SFT) → 78% (DPO) measured on a held-out 200-pair eval, with full model card."

Capstone Repo README Template

Every capstone repo's README should follow this skeleton:

# <Project Name> — <One-Sentence Pitch>

![Architecture](docs/architecture.png)

## What This Is
<2 paragraphs>

## Headline Results
| Metric | Baseline | This Project | Δ |
|--------|----------|--------------|---|
| ...    | ...      | ...          | ...|

## Quickstart
```bash
make build && make run && make eval

Architecture

Design Decisions & Tradeoffs

Why X over Y: ...
Why we chose this chunking strategy: ...

Benchmarks

Limitations

What I'd Do Next

Reproducing


---

## Final Interview Prep Loop

Once your capstones are shipped, do this for each one **before** going on-site:

1. Write a **5-minute talk** explaining the project (no slides — just talking).
2. Identify **3 design decisions** you'd defend in interviews and **3 tradeoffs** you'd debate.
3. Identify **2 things you'd change** if you had another month — and articulate why.
4. Identify **1 unsolved problem** in the project that you'd love to discuss with the interviewer.

This converts your capstones into interview ammunition.