Step 1 — Clarify the requirements
Never start drawing boxes. A strong candidate spends the first few minutes scoping the problem so the design that follows is justified. For a distributed cache, the questions worth asking are:
- What are we caching (query results, objects, sessions) and the read/write mix?
- What is the consistency requirement: can reads be slightly stale?
- What is the working-set size vs available memory?
- Is durability needed, or is the cache purely a volatile accelerator?
Functional requirements
- get(key) and set(key, value, ttl) with low latency.
- Distribute data across nodes and scale by adding nodes.
- Evict under memory pressure and expire by TTL.
Non-functional requirements
- Sub-millisecond reads for cache hits.
- Graceful behaviour on node failure (lose a slice, not everything).
- High hit ratio for the working set.
Step 2 — Back-of-the-envelope estimates
Sizing the system tells you which parts are hard. Round aggressively and state your assumptions out loud; the numbers matter less than showing you can reason about scale.
| Metric | Estimate | Reasoning |
|---|---|---|
| Hit ratio target | > 90% | The cache only pays for itself if most reads hit it; size memory to the hot set. |
| Latency | sub-ms on hit | In-memory access is the whole point; a miss falls through to the slower store. |
Step 3 — Data model and API
A compact data model and a small API surface anchor the rest of the discussion. Keep both minimal; you can always extend them when the interviewer pushes.
Core entities
cache entry
key, value, ttl, last_accessed
In-memory; metadata drives eviction (LRU/LFU) and expiry.
hash ring
node_id, virtual nodes, key range
Consistent hashing maps keys to cache nodes.
API sketch
- GET
cache.get(key)— Return value or miss. - PUT
cache.set(key, value, ttl)— Store with expiry. - DELETE
cache.delete(key)— Invalidate a key.
Step 4 — High-level design
Sketch the happy path end to end before optimising anything. This is the architecture you would draw on the whiteboard first:
- 1Clients hash a key to a node via a consistent-hashing ring and talk to that node directly.
- 2Each node stores entries in memory with an eviction policy and TTLs.
- 3The application uses a write strategy (cache-aside is most common) to keep cache and DB in step.
- 4Replication and graceful degradation limit the blast radius of a node failure.
Step 5 — Deep dives that separate strong answers
The high-level design is table stakes. Interviewers spend most of the time here, probing the decisions that actually carry the system. These are the ones to be ready for.
Eviction and write strategies
Memory is finite, so the cache evicts under pressure: LRU (evict least recently used) suits temporal locality, LFU (least frequently used) suits stable hot sets, and TTL bounds staleness regardless of access. The write strategy decides how cache and database stay consistent. Cache-aside (lazy loading): the app reads cache, on a miss loads from the DB and populates the cache; writes go to the DB and invalidate the key. Write-through: writes go through the cache to the DB synchronously, keeping them consistent at the cost of write latency. Write-back: writes hit the cache and flush to the DB asynchronously, fast but risking loss on a crash. Name the one you would use and the staleness it implies.
Sharding and node failure
A cache outgrows one machine, so partition keys across nodes. Plain modulo hashing remaps almost every key when the node count changes, which would cold-start the whole cache, so use consistent hashing with virtual nodes: adding or removing a node only moves that node's share of keys. On node failure you lose only that node's slice, and traffic for those keys falls through to the database temporarily; optional replication of hot keys reduces even that. This is the difference between a node failure being a blip and being an outage-causing thundering herd on the DB.
Invalidation, hot keys, and stampedes
Invalidation is the famous hard problem: stale data is the usual cause of cache bugs, so be explicit about TTLs and on-write invalidation. Hot keys (a single viral item) can overload one node; mitigate by replicating that key across nodes or adding a small local cache in the client. A cache stampede happens when a popular key expires and a flood of concurrent misses all hit the database at once; defend with a mutex/single-flight so one request repopulates while others wait, or with probabilistic early expiry. These three (invalidation, hot keys, stampede) are the questions an interviewer drills into.
Step 6 — Bottlenecks and how to scale past them
Naming where the design breaks, and the specific fix, is what signals seniority. For a distributed cache the pressure points are:
Node failure cold-starts a slice of the cache.
Consistent hashing limits the loss to one node; replicate hot keys.
Hot key overloads a single node.
Replicate the key or add a client-side local cache.
Cache stampede on expiry.
Single-flight repopulation or probabilistic early refresh.
Step 7 — Key tradeoffs
There is rarely one right answer. State the tradeoff, then commit to a side with a reason tied to the requirements you clarified in step one.
Write strategy
Write-through (consistent, slower writes)
Write-back (fast, risk loss)
Guidance: Cache-aside by default; write-through when reads must not be stale; write-back only with durability guards.
Eviction policy
LRU (temporal locality)
LFU (stable hot set)
Guidance: LRU is the safe default; LFU when access frequency is stable and skewed.
Common follow-up questions
When you finish the core design, expect the interviewer to pull on one of these threads. Have a one-paragraph answer ready for each.
- How do you prevent a cache stampede on a popular expiring key?
- Memory is finite, so the cache evicts under pressure: LRU (evict least recently used) suits temporal locality, LFU (least frequently used) suits stable hot sets, and TTL bounds staleness regardless of access. The write strategy decides how cache and database stay consistent. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How would you make the cache survive a node restart (persistence)?
- A cache outgrows one machine, so partition keys across nodes. Plain modulo hashing remaps almost every key when the node count changes, which would cold-start the whole cache, so use consistent hashing with virtual nodes: adding or removing a node only moves that node's share of keys. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you keep two caches in two regions roughly consistent?
- Invalidation is the famous hard problem: stale data is the usual cause of cache bugs, so be explicit about TTLs and on-write invalidation. Hot keys (a single viral item) can overload one node; mitigate by replicating that key across nodes or adding a small local cache in the client. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- When is caching the wrong answer?
- Memory is finite, so the cache evicts under pressure: LRU (evict least recently used) suits temporal locality, LFU (least frequently used) suits stable hot sets, and TTL bounds staleness regardless of access. The write strategy decides how cache and database stay consistent. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.