System Design Walkthroughs (LLM / Foundation Models)
Six end-to-end walkthroughs in the format expected by Senior+ infra/foundation-model interviews at Anthropic, OpenAI, DeepMind, Meta, xAI, Mistral, Cohere, Databricks.
| # | Doc | Target Roles |
|---|---|---|
| 01 | LLM Inference Gateway @ 100k QPS | LLM Inference / LLM Infrastructure |
| 02 | Distributed Pretraining (8B → 70B) | Research Engineer Pretraining |
| 03 | RAG at Scale (100M docs, 1k QPS) | Applied AI / Search |
| 04 | Fine-Tuning Platform | Post-training Engineer |
| 05 | Eval Platform (continuous + LLM-judge) | Model Evaluation Engineer |
| 06 | Pretraining Data Pipeline (10TB → tokens) | Pretraining Data Engineer |
Standard Structure
Every walkthrough uses the same template so you can practice the rhythm:
- Clarifying questions (functional + non-functional)
- Capacity estimation (QPS, storage, GPU-hours, $$$)
- API & data model
- High-level architecture (ASCII diagram)
- Deep dives (3-5 key subsystems)
- Bottlenecks & scaling
- Failure modes & mitigation
- Observability
- Cost model
- Tradeoffs & alternatives
How To Use
For each doc:
- Cover the answer with your hand. Spend 45 minutes whiteboarding it cold.
- Compare your design to the doc.
- Note 3 things you missed. Re-do in 1 week.