Phase 11 — Capstone Projects
Difficulty: ⭐⭐⭐⭐⭐ | Estimated Time: 2–4 weeks per capstone Roles supported: All. The capstone is what hiring managers actually click on.
Capstone Philosophy
A capstone is not another lab. It is a single, polished, public GitHub repo with:
- A README that a stranger can understand in 90 seconds
- An architecture diagram (Excalidraw / Mermaid / draw.io)
- Reproducible benchmarks (numbers, not adjectives)
- A "tradeoffs" and "what I'd do next" section
- A live demo or screencast where applicable
Pick at least 2 of the 4 capstones below to ship publicly. Pick the ones aligned with your target role.
Capstone 1 — Mini-GPT Pretrained on a Custom Corpus
Target roles: Research Engineer Pretraining, Foundation Model Engineer.
| Field | Value |
|---|---|
| Goal | End-to-end pretraining: your tokenizer → your data pipeline → your transformer → your training loop → your eval. |
| Pipeline | Data scrape/clean → BPE training → packing → nanoGPT training (≥ 50M params) → eval (perplexity + 2 downstream tasks via Phase 8 harness) → model card. |
| Hardware | 1× A100 for |
| Deliverables | GitHub repo, W&B run, model card, blog post |
| Resume Bullet | "Pre-trained a 60M-parameter decoder-only transformer end-to-end (custom BPE tokenizer + 4 GB cleaned corpus + FSDP training + Phase 8 eval harness); achieved val perplexity 6.4 in 9 GPU-hours, reproducible from scratch in <$20 of cloud compute." |
Capstone 2 — Production RAG with Eval
Target roles: Applied AI Engineer, LLM Inference Engineer.
| Field | Value |
|---|---|
| Goal | A RAG service good enough to put in front of users, with quantified quality. |
| Pipeline | Real corpus (≥ 5k docs) → chunking → hybrid retrieval (BM25 + dense) → cross-encoder re-ranker → generation with citations → SSE streaming → RAGAS eval → A/B harness comparing retrievers. |
| Stack | FastAPI, Qdrant, sentence-transformers, BGE-reranker, Llama-3-8B (vLLM) or hosted, RAGAS |
| Deliverables | Repo + live demo (Gradio / web) + RAGAS scorecard + ablation table |
| Resume Bullet | "Built a production RAG service (Qdrant + BM25 + RRF + BGE reranker + vLLM-served Llama-3-8B) over a 12k-document corpus, exposed via FastAPI/SSE; quantified quality with RAGAS (faithfulness 0.87, context precision 0.81) and ran 6 documented design ablations." |
Capstone 3 — LLM Inference Gateway (the Hire-Magnet for Infra Roles)
Target roles: LLM Inference Engineer, ML Systems Engineer.
| Field | Value |
|---|---|
| Goal | A multi-model inference gateway with all the production features. |
| Features | (1) Continuous batching, (2) KV-cache + prefix caching, (3) INT4 AWQ quantization, (4) SSE streaming, (5) per-tenant rate limits, (6) OpenTelemetry tracing, (7) Prometheus metrics + Grafana dashboard, (8) admission control under load, (9) graceful drain on shutdown, (10) /v1/chat/completions OpenAI-compatible API. |
| Stack | vLLM under the hood, FastAPI gateway, Redis (rate limit), Prometheus, Grafana, OpenTelemetry, Docker Compose |
| Benchmark | TTFT P50/P99, TPOT, max sustained tok/s, $/M-tokens — all reported in README |
| Deliverables | Repo + Docker Compose stack + benchmark report + architecture diagram |
| Resume Bullet | "Designed and shipped an OpenAI-compatible LLM inference gateway (vLLM core + FastAPI + Redis rate limit + OpenTelemetry tracing + Prometheus/Grafana) achieving sustained 1,420 tok/s at P99 TTFT 230 ms on a single A100; reduced $/M-tokens by 58% vs naive HuggingFace serving." |
Capstone 4 — Domain Assistant: SFT + DPO + Eval
Target roles: Post-training Engineer, Production Model Post-Training.
| Field | Value |
|---|---|
| Goal | Take a base 7B → SFT on domain data → DPO on preferences → measurable improvement. |
| Pipeline | Domain pick (legal, medical, finance, code) → 5k synthetic instruction set (Phase 6 Lab 3) → QLoRA SFT (Phase 6 Lab 2) → 1k preference pairs → DPO (Phase 6 Lab 4) → Phase 8 eval comparing base vs SFT vs SFT+DPO. |
| Stack | trl, peft, bitsandbytes, your Phase 8 harness |
| Deliverables | Adapters on HF Hub, eval scorecard, model card with intended use + limitations |
| Resume Bullet | "Trained a domain assistant (Llama-3-8B QLoRA SFT + DPO) on 5k synthetic instructions and 1k preference pairs; preference-win-rate vs base improved 23% → 71% (SFT) → 78% (DPO) measured on a held-out 200-pair eval, with full model card." |
Capstone Repo README Template
Every capstone repo's README should follow this skeleton:
# <Project Name> — <One-Sentence Pitch>

## What This Is
<2 paragraphs>
## Headline Results
| Metric | Baseline | This Project | Δ |
|--------|----------|--------------|---|
| ... | ... | ... | ...|
## Quickstart
```bash
make build && make run && make eval
Architecture
<Diagram + 3-paragraph explanation>
Design Decisions & Tradeoffs
- Why X over Y: ...
- Why we chose this chunking strategy: ...
Benchmarks
<Tables and plots — reproducibility command included>
Limitations
- ...
What I'd Do Next
- ...
Reproducing
<Exact commands, expected hardware, expected runtime, expected cost>
---
## Final Interview Prep Loop
Once your capstones are shipped, do this for each one **before** going on-site:
1. Write a **5-minute talk** explaining the project (no slides — just talking).
2. Identify **3 design decisions** you'd defend in interviews and **3 tradeoffs** you'd debate.
3. Identify **2 things you'd change** if you had another month — and articulate why.
4. Identify **1 unsolved problem** in the project that you'd love to discuss with the interviewer.
This converts your capstones into interview ammunition.