How do I structure a 45-minute system design interview?

Spend the first five minutes clarifying functional and non-functional requirements, five on back-of-the-envelope scale estimates, five sketching the API, five to ten on the data model and storage choice, ten on the high-level diagram, and the rest on one or two deep dives the interviewer steers you toward. The single biggest mistake is jumping to a diagram before agreeing on scope and scale, because every later decision depends on those numbers.

Do I need to memorise the latency numbers?

You do not need them to the nanosecond, but you should know the orders of magnitude: memory is nanoseconds, SSD is microseconds, a same-datacenter round trip is sub-millisecond, and a cross-continent round trip is over a hundred milliseconds. These ratios are what let you justify a cache or a CDN with real reasoning instead of hand-waving, which is exactly what interviewers are listening for.

What separates a senior answer from a junior one?

Juniors describe components; seniors name trade-offs. A strong answer does not just say 'add a cache' - it says why, which eviction policy, what the staleness window is, and what happens on a cache miss or a stampede. Seniors also scope ruthlessly, estimate before designing, and proactively identify the system's bottleneck rather than waiting to be asked. The cheat sheet above is the vocabulary; the seniority is in the justification.

Is a system design round only for senior engineers?

It is most heavily weighted for senior and staff loops, but plenty of mid-level loops now include a scaled-down design round, and new-grad loops sometimes include a lightweight one. Even if your target level does not formally test it, the thinking - estimating scale, choosing storage, naming trade-offs - makes you better in the coding and behavioural rounds too, so it is rarely wasted preparation.

System design prep

System design interview cheat sheet

One dense reference page for a system design interview: the framework to run the 45 minutes, the latency numbers and capacity math to estimate with, the building blocks to assemble, the database choices, and the trade-offs that separate a senior answer from a junior one. Skim it the night before, or use it as a checklist while you practise.

On this pageFrameworkLatency numbersCapacity mathBuilding blocksDatabasesTrade-offs

The framework: how to run the 45 minutes

A system design interview is open-ended on purpose, and candidates who freeze are usually the ones without a process. Run these six steps in order and narrate as you go. The timings are a guide for a 45-minute round, not a rule, but the order matters: each step feeds the next.

1.Clarify requirements
5 min
Separate functional (what it does) from non-functional (scale, latency, availability, consistency). Pin down read/write ratio, expected users, and the one or two features the interview is really about. Do not start drawing until you have agreed scope.
2.Estimate scale
5 min
Back-of-the-envelope: daily active users, requests per second, storage per year, and bandwidth. These numbers decide whether you need sharding, a cache, or a CDN, so they are not busywork - they drive every later decision.
3.Define the API
5 min
Sketch the handful of endpoints (or RPCs) the core features need. Naming the API forces you to commit to the data that flows in and out, which makes the data model fall out naturally.
4.Data model & storage
5-10 min
Choose SQL vs NoSQL from the access patterns, not from habit. Define the main entities, the primary keys, and how you would shard or index them at the scale you just estimated.
5.High-level design
10 min
Draw the request path: clients, load balancer, application servers, cache, database, and any async workers or queues. Walk one read and one write through the diagram out loud.
6.Deep dive & trade-offs
10-15 min
Pick the one or two hardest parts the interviewer cares about and go deep: the hot-key problem, consistency on writes, the cache eviction policy, or how you scale the bottleneck. Name the trade-off for each choice.

Numbers every engineer should know

You will never be asked to recite these, but you use them constantly to justify a cache, a CDN, or a different storage tier. Know the orders of magnitude. The latency figures are the classic "latency numbers every programmer should know" set, rounded to the scale that matters in an interview.

Latency, by order of magnitude
L1 cache reference	~1 ns
Branch mispredict	~3 ns
L2 cache reference	~4 ns
Main memory (RAM) reference	~100 ns
Read 1 MB sequentially from memory	~3 µs
SSD random read	~16 µs
Read 1 MB sequentially from SSD	~49 µs
Round trip within the same datacenter	~0.5 ms
Read 1 MB sequentially from disk (HDD)	~825 µs
Disk (HDD) seek	~2-10 ms
Round trip CA to Netherlands and back	~150 ms

Capacity math & rules of thumb
Seconds in a day	~86,400 (~10^5)
Requests/sec from 1M daily users (even spread)	~12 RPS
Requests/sec from 1M daily users (peak ~5x)	~60 RPS
Characters per typical tweet/post	~140-280 bytes
One modern server	thousands of QPS, ~64-256 GB RAM
Read-heavy systems	cache aggressively, replicate reads
Write-heavy systems	shard, batch, use a queue

Building blocks and when to reach for them

Most designs are assembled from the same small kit. Know what each piece does and the specific problem it solves, so you add it for a reason rather than out of reflex.

Load balancer: Spread traffic across stateless app servers (L4 for raw throughput, L7 for routing on path/header). Adds horizontal scale and a single health-checked entry point.
CDN: Serve static assets and cacheable responses from the edge, close to users. Cuts latency and origin load; the first reach for any global, read-heavy, media-serving system.
Cache (Redis / Memcached): Put hot reads in memory in front of the database. Decide a policy: cache-aside is the default; choose an eviction strategy (LRU) and a TTL, and have an answer for stale data and cache stampedes.
Message queue (Kafka / SQS / RabbitMQ): Decouple producers from consumers and absorb write spikes. Turns a synchronous, fragile path into an async, retryable one; the backbone of most write-heavy and event-driven designs.
Database replication: Add read replicas to scale reads and improve availability. Be explicit about replication lag and that replicas are eventually consistent.
Sharding / partitioning: Split data across nodes when one machine cannot hold or serve it. Pick a shard key that spreads load evenly and avoids hotspots; consistent hashing limits reshuffling when nodes change.
Rate limiter: Protect the system from abuse and overload. Token bucket is the usual answer; place it at the edge or in a shared store so the limit holds across servers.

Choosing a database

Pick storage from the access pattern, not from familiarity. Say what you are optimising for and which store fits, then name the cost.

Type	Reach for it when
Relational (PostgreSQL, MySQL)	Strong consistency, transactions, and rich queries with joins. Default choice unless scale or access patterns force otherwise. Scales vertically, then with read replicas and careful sharding.
Key-value (Redis, DynamoDB)	Simple, high-throughput lookups by key. Great for sessions, caches, and counters; predictable single-digit-millisecond reads at scale.
Document (MongoDB)	Flexible, nested records with no fixed schema. Good when the entity shape varies or evolves and queries are mostly by a single document.
Wide-column (Cassandra, Bigtable)	Massive write throughput and horizontal scale with tunable consistency. Reach for it on write-heavy, time-series, or feed workloads.
Search (Elasticsearch)	Full-text search, ranking, and aggregations. A secondary index alongside the source-of-truth database, not a replacement for it.

The trade-offs interviewers listen for

The component is the easy half; the trade-off is what earns the senior signal. For every choice you make, say what you are giving up.

CAP theorem: During a network partition you must choose between consistency and availability; you cannot have both. Most user-facing systems pick availability and accept eventual consistency (AP); systems like payments and inventory pick consistency (CP).
Consistency models: Strong consistency means every read sees the latest write (simpler to reason about, harder to scale). Eventual consistency means reads may briefly be stale (cheaper, more available). State which you choose and why for each data path.
SQL vs NoSQL: SQL gives transactions, joins, and a fixed schema; NoSQL gives horizontal scale and schema flexibility. The honest answer is usually 'SQL until the access pattern or scale forces a specific NoSQL store', not a blanket preference.
Latency vs throughput: Latency is how long one request takes; throughput is how many you handle per second. Batching and queues raise throughput at the cost of per-request latency; optimise for whichever the requirements name.
Push vs pull (fan-out): Fan-out-on-write (push) precomputes feeds for fast reads but is expensive for users with millions of followers. Fan-out-on-read (pull) is cheaper to write but slower to read. Real systems use a hybrid: push for normal users, pull for celebrities.

Frequently asked questions

How do I structure a 45-minute system design interview?: Spend the first five minutes clarifying functional and non-functional requirements, five on back-of-the-envelope scale estimates, five sketching the API, five to ten on the data model and storage choice, ten on the high-level diagram, and the rest on one or two deep dives the interviewer steers you toward. The single biggest mistake is jumping to a diagram before agreeing on scope and scale, because every later decision depends on those numbers.
Do I need to memorise the latency numbers?: You do not need them to the nanosecond, but you should know the orders of magnitude: memory is nanoseconds, SSD is microseconds, a same-datacenter round trip is sub-millisecond, and a cross-continent round trip is over a hundred milliseconds. These ratios are what let you justify a cache or a CDN with real reasoning instead of hand-waving, which is exactly what interviewers are listening for.
What separates a senior answer from a junior one?: Juniors describe components; seniors name trade-offs. A strong answer does not just say 'add a cache' - it says why, which eviction policy, what the staleness window is, and what happens on a cache miss or a stampede. Seniors also scope ruthlessly, estimate before designing, and proactively identify the system's bottleneck rather than waiting to be asked. The cheat sheet above is the vocabulary; the seniority is in the justification.
Is a system design round only for senior engineers?: It is most heavily weighted for senior and staff loops, but plenty of mid-level loops now include a scaled-down design round, and new-grad loops sometimes include a lightweight one. Even if your target level does not formally test it, the thinking - estimating scale, choosing storage, naming trade-offs - makes you better in the coding and behavioural rounds too, so it is rarely wasted preparation.

Go deeper

A cheat sheet is the index, not the whole course. Work the coding lists, the worked system-design guides, and the questions specific companies ask.

System design prep

System design interview cheat sheet

On this pageFrameworkLatency numbersCapacity mathBuilding blocksDatabasesTrade-offs

The framework: how to run the 45 minutes

1.Clarify requirements
5 min
Separate functional (what it does) from non-functional (scale, latency, availability, consistency). Pin down read/write ratio, expected users, and the one or two features the interview is really about. Do not start drawing until you have agreed scope.
2.Estimate scale
5 min
Back-of-the-envelope: daily active users, requests per second, storage per year, and bandwidth. These numbers decide whether you need sharding, a cache, or a CDN, so they are not busywork - they drive every later decision.
3.Define the API
5 min
Sketch the handful of endpoints (or RPCs) the core features need. Naming the API forces you to commit to the data that flows in and out, which makes the data model fall out naturally.
4.Data model & storage
5-10 min
Choose SQL vs NoSQL from the access patterns, not from habit. Define the main entities, the primary keys, and how you would shard or index them at the scale you just estimated.
5.High-level design
10 min
Draw the request path: clients, load balancer, application servers, cache, database, and any async workers or queues. Walk one read and one write through the diagram out loud.
6.Deep dive & trade-offs
10-15 min
Pick the one or two hardest parts the interviewer cares about and go deep: the hot-key problem, consistency on writes, the cache eviction policy, or how you scale the bottleneck. Name the trade-off for each choice.

Numbers every engineer should know

Latency, by order of magnitude
L1 cache reference	~1 ns
Branch mispredict	~3 ns
L2 cache reference	~4 ns
Main memory (RAM) reference	~100 ns
Read 1 MB sequentially from memory	~3 µs
SSD random read	~16 µs
Read 1 MB sequentially from SSD	~49 µs
Round trip within the same datacenter	~0.5 ms
Read 1 MB sequentially from disk (HDD)	~825 µs
Disk (HDD) seek	~2-10 ms
Round trip CA to Netherlands and back	~150 ms

Capacity math & rules of thumb
Seconds in a day	~86,400 (~10^5)
Requests/sec from 1M daily users (even spread)	~12 RPS
Requests/sec from 1M daily users (peak ~5x)	~60 RPS
Characters per typical tweet/post	~140-280 bytes
One modern server	thousands of QPS, ~64-256 GB RAM
Read-heavy systems	cache aggressively, replicate reads
Write-heavy systems	shard, batch, use a queue

Building blocks and when to reach for them

Most designs are assembled from the same small kit. Know what each piece does and the specific problem it solves, so you add it for a reason rather than out of reflex.

Load balancer: Spread traffic across stateless app servers (L4 for raw throughput, L7 for routing on path/header). Adds horizontal scale and a single health-checked entry point.
CDN: Serve static assets and cacheable responses from the edge, close to users. Cuts latency and origin load; the first reach for any global, read-heavy, media-serving system.
Cache (Redis / Memcached): Put hot reads in memory in front of the database. Decide a policy: cache-aside is the default; choose an eviction strategy (LRU) and a TTL, and have an answer for stale data and cache stampedes.
Message queue (Kafka / SQS / RabbitMQ): Decouple producers from consumers and absorb write spikes. Turns a synchronous, fragile path into an async, retryable one; the backbone of most write-heavy and event-driven designs.
Database replication: Add read replicas to scale reads and improve availability. Be explicit about replication lag and that replicas are eventually consistent.
Sharding / partitioning: Split data across nodes when one machine cannot hold or serve it. Pick a shard key that spreads load evenly and avoids hotspots; consistent hashing limits reshuffling when nodes change.
Rate limiter: Protect the system from abuse and overload. Token bucket is the usual answer; place it at the edge or in a shared store so the limit holds across servers.

Choosing a database

Pick storage from the access pattern, not from familiarity. Say what you are optimising for and which store fits, then name the cost.

Type	Reach for it when
Relational (PostgreSQL, MySQL)	Strong consistency, transactions, and rich queries with joins. Default choice unless scale or access patterns force otherwise. Scales vertically, then with read replicas and careful sharding.
Key-value (Redis, DynamoDB)	Simple, high-throughput lookups by key. Great for sessions, caches, and counters; predictable single-digit-millisecond reads at scale.
Document (MongoDB)	Flexible, nested records with no fixed schema. Good when the entity shape varies or evolves and queries are mostly by a single document.
Wide-column (Cassandra, Bigtable)	Massive write throughput and horizontal scale with tunable consistency. Reach for it on write-heavy, time-series, or feed workloads.
Search (Elasticsearch)	Full-text search, ranking, and aggregations. A secondary index alongside the source-of-truth database, not a replacement for it.

The trade-offs interviewers listen for

The component is the easy half; the trade-off is what earns the senior signal. For every choice you make, say what you are giving up.

CAP theorem: During a network partition you must choose between consistency and availability; you cannot have both. Most user-facing systems pick availability and accept eventual consistency (AP); systems like payments and inventory pick consistency (CP).
Consistency models: Strong consistency means every read sees the latest write (simpler to reason about, harder to scale). Eventual consistency means reads may briefly be stale (cheaper, more available). State which you choose and why for each data path.
SQL vs NoSQL: SQL gives transactions, joins, and a fixed schema; NoSQL gives horizontal scale and schema flexibility. The honest answer is usually 'SQL until the access pattern or scale forces a specific NoSQL store', not a blanket preference.
Latency vs throughput: Latency is how long one request takes; throughput is how many you handle per second. Batching and queues raise throughput at the cost of per-request latency; optimise for whichever the requirements name.
Push vs pull (fan-out): Fan-out-on-write (push) precomputes feeds for fast reads but is expensive for users with millions of followers. Fan-out-on-read (pull) is cheaper to write but slower to read. Real systems use a hybrid: push for normal users, pull for celebrities.

Frequently asked questions

How do I structure a 45-minute system design interview?: Spend the first five minutes clarifying functional and non-functional requirements, five on back-of-the-envelope scale estimates, five sketching the API, five to ten on the data model and storage choice, ten on the high-level diagram, and the rest on one or two deep dives the interviewer steers you toward. The single biggest mistake is jumping to a diagram before agreeing on scope and scale, because every later decision depends on those numbers.
Do I need to memorise the latency numbers?: You do not need them to the nanosecond, but you should know the orders of magnitude: memory is nanoseconds, SSD is microseconds, a same-datacenter round trip is sub-millisecond, and a cross-continent round trip is over a hundred milliseconds. These ratios are what let you justify a cache or a CDN with real reasoning instead of hand-waving, which is exactly what interviewers are listening for.
What separates a senior answer from a junior one?: Juniors describe components; seniors name trade-offs. A strong answer does not just say 'add a cache' - it says why, which eviction policy, what the staleness window is, and what happens on a cache miss or a stampede. Seniors also scope ruthlessly, estimate before designing, and proactively identify the system's bottleneck rather than waiting to be asked. The cheat sheet above is the vocabulary; the seniority is in the justification.
Is a system design round only for senior engineers?: It is most heavily weighted for senior and staff loops, but plenty of mid-level loops now include a scaled-down design round, and new-grad loops sometimes include a lightweight one. Even if your target level does not formally test it, the thinking - estimating scale, choosing storage, naming trade-offs - makes you better in the coding and behavioural rounds too, so it is rarely wasted preparation.

Go deeper

A cheat sheet is the index, not the whole course. Work the coding lists, the worked system-design guides, and the questions specific companies ask.

The framework: how to run the 45 minutes

1.Clarify requirements

2.Estimate scale

3.Define the API

4.Data model & storage

5.High-level design

6.Deep dive & trade-offs

Numbers every engineer should know

Building blocks and when to reach for them

Choosing a database

The trade-offs interviewers listen for

Frequently asked questions

Go deeper

The framework: how to run the 45 minutes

1.Clarify requirements

2.Estimate scale

3.Define the API

4.Data model & storage

5.High-level design

6.Deep dive & trade-offs

Numbers every engineer should know

Building blocks and when to reach for them

Choosing a database

The trade-offs interviewers listen for

Frequently asked questions

Go deeper