System Design Walkthroughs (LLM / Foundation Models)

Six end-to-end walkthroughs in the format expected by Senior+ infra/foundation-model interviews at Anthropic, OpenAI, DeepMind, Meta, xAI, Mistral, Cohere, Databricks.

#DocTarget Roles
01LLM Inference Gateway @ 100k QPSLLM Inference / LLM Infrastructure
02Distributed Pretraining (8B → 70B)Research Engineer Pretraining
03RAG at Scale (100M docs, 1k QPS)Applied AI / Search
04Fine-Tuning PlatformPost-training Engineer
05Eval Platform (continuous + LLM-judge)Model Evaluation Engineer
06Pretraining Data Pipeline (10TB → tokens)Pretraining Data Engineer

Standard Structure

Every walkthrough uses the same template so you can practice the rhythm:

  1. Clarifying questions (functional + non-functional)
  2. Capacity estimation (QPS, storage, GPU-hours, $$$)
  3. API & data model
  4. High-level architecture (ASCII diagram)
  5. Deep dives (3-5 key subsystems)
  6. Bottlenecks & scaling
  7. Failure modes & mitigation
  8. Observability
  9. Cost model
  10. Tradeoffs & alternatives

How To Use

For each doc:

  1. Cover the answer with your hand. Spend 45 minutes whiteboarding it cold.
  2. Compare your design to the doc.
  3. Note 3 things you missed. Re-do in 1 week.