System Design Walkthroughs (LLM / Foundation Models)

Six end-to-end walkthroughs in the format expected by Senior+ infra/foundation-model interviews at Anthropic, OpenAI, DeepMind, Meta, xAI, Mistral, Cohere, Databricks.

#	Doc	Target Roles
01	LLM Inference Gateway @ 100k QPS	LLM Inference / LLM Infrastructure
02	Distributed Pretraining (8B → 70B)	Research Engineer Pretraining
03	RAG at Scale (100M docs, 1k QPS)	Applied AI / Search
04	Fine-Tuning Platform	Post-training Engineer
05	Eval Platform (continuous + LLM-judge)	Model Evaluation Engineer
06	Pretraining Data Pipeline (10TB → tokens)	Pretraining Data Engineer

Standard Structure

Every walkthrough uses the same template so you can practice the rhythm:

Clarifying questions (functional + non-functional)
Capacity estimation (QPS, storage, GPU-hours, $$$)
API & data model
High-level architecture (ASCII diagram)
Deep dives (3-5 key subsystems)
Bottlenecks & scaling
Failure modes & mitigation
Observability
Cost model
Tradeoffs & alternatives

How To Use

For each doc:

Cover the answer with your hand. Spend 45 minutes whiteboarding it cold.
Compare your design to the doc.
Note 3 things you missed. Re-do in 1 week.

LLM Inference Engineer

System Design Walkthroughs (LLM / Foundation Models)

Standard Structure

How To Use