Design Search Autocomplete (Typeahead)

Step 1 - Clarify the requirements

Never start drawing boxes. A strong candidate spends the first few minutes scoping the problem so the design that follows is justified. For a search autocomplete system, the questions worth asking are:

How many suggestions do we return, and ranked by what (popularity, personalisation)?
What latency budget per keystroke (typically under ~100 ms end-to-end)?
How fresh must suggestions be: do trending queries need to appear quickly?
Do we support typo tolerance and multi-language input?

Functional requirements

Return the top-k completions for a prefix as the user types.
Rank suggestions by popularity (and optionally personalisation).
Update the suggestion set from real query traffic.

Non-functional requirements

Very low latency per keystroke; the UX feels instant or not at all.
Scale to high query volume across many users.
Reasonable freshness so trending terms surface.

Step 2 - Back-of-the-envelope estimates

Sizing the system tells you which parts are hard. Round aggressively and state your assumptions out loud; the numbers matter less than showing you can reason about scale.

Metric	Estimate	Reasoning
Latency budget	< 100 ms per keystroke	Each character typed is a request; slow suggestions are worse than none.
Read:write ratio	massively read-heavy	Serving suggestions vastly outnumbers updating the index from logs.

Step 3 - Data model and API

A compact data model and a small API surface anchor the rest of the discussion. Keep both minimal; you can always extend them when the interviewer pushes.

Core entities

trie nodes

prefix -> top-k suggestions (precomputed) with scores

Cache top-k at each node so a query is a single lookup, not a subtree walk.

query_logs

query, count, last_seen

Aggregated source of popularity, fed by the build pipeline.

API sketch

GET/api/v1/suggest?q={prefix}- Return ranked completions for the prefix.

Step 4 - High-level design

Sketch the happy path end to end before optimising anything. This is the architecture you would draw on the whiteboard first:

1Build path: aggregate query logs offline to compute each term's popularity and assemble a trie.
2Precompute and store the top-k completions at every prefix node so serving is one lookup.
3Serve path: a request resolves the prefix and returns the cached top-k from an in-memory store.
4Periodically rebuild/merge the trie to fold in new and trending queries.

Step 5 - Deep dives that separate strong answers

The high-level design is table stakes. Interviewers spend most of the time here, probing the decisions that actually carry the system. These are the ones to be ready for.

Trie with precomputed top-k

A trie stores strings as a tree of characters, so finding all completions of a prefix means walking to the prefix node and collecting its subtree. Doing that subtree walk on every keystroke is too slow at scale, so the key optimisation is to precompute and cache the top-k suggestions directly at each node. Then a query is: navigate to the prefix node (O(prefix length)) and return its stored list, with no traversal at serve time. This trades extra storage and build cost for the strict latency the feature demands.

Build pipeline and ranking

Popularity comes from real usage, so a batch (or near-real-time streaming) pipeline aggregates query logs into counts, optionally time-decayed so trending terms rise and stale ones fade. The pipeline rebuilds the trie (or updates it incrementally) with refreshed top-k lists. Because rebuilding on every query is impossible, accept that suggestions lag traffic by minutes to hours for the bulk index, and layer a faster path for trending terms if real-time freshness is required. Ranking can blend raw frequency with recency and, for logged-in users, personalisation.

Serving at scale, sharding, and caching

The serving tier keeps the trie (or its top-k tables) in memory for sub-millisecond lookups, fronted by an aggressive cache because prefixes are heavily skewed (a tiny set of prefixes covers most traffic). Shard the trie by first character or prefix range across nodes so it fits in memory and load spreads, routing each request to the owning shard. A CDN or edge cache can even serve the most common prefixes. The result is a system where the expensive work is all offline and the online path is a cached memory read.

Step 6 - Bottlenecks and how to scale past them

Naming where the design breaks, and the specific fix, is what signals seniority. For a search autocomplete system the pressure points are:

Subtree traversal per keystroke is too slow.

Precompute top-k at each trie node; serve is then a single lookup.

Trie too large for one node's memory.

Shard by prefix; keep shards in memory and route per request.

Trending terms lag the batch rebuild.

Add a streaming fast-path layer for recent/trending queries.

Step 7 - Key tradeoffs

There is rarely one right answer. State the tradeoff, then commit to a side with a reason tied to the requirements you clarified in step one.

Freshness vs build cost

Batch rebuild (cheap, lags traffic)

Streaming updates (fresh, complex)

Guidance: Batch for the bulk index; stream a small trending layer if real-time matters.

Storage vs latency

Walk subtree at query time (compact, slow)

Precompute top-k per node (fast, larger)

Guidance: Precompute: the latency requirement dominates the storage cost.

Common follow-up questions

When you finish the core design, expect the interviewer to pull on one of these threads. Have a one-paragraph answer ready for each.

How would you add typo tolerance (fuzzy matching)?: A trie stores strings as a tree of characters, so finding all completions of a prefix means walking to the prefix node and collecting its subtree. Doing that subtree walk on every keystroke is too slow at scale, so the key optimisation is to precompute and cache the top-k suggestions directly at each node. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
How do you personalise suggestions per user?: Popularity comes from real usage, so a batch (or near-real-time streaming) pipeline aggregates query logs into counts, optionally time-decayed so trending terms rise and stale ones fade. The pipeline rebuilds the trie (or updates it incrementally) with refreshed top-k lists. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
How do you handle multi-word and multi-language prefixes?: The serving tier keeps the trie (or its top-k tables) in memory for sub-millisecond lookups, fronted by an aggressive cache because prefixes are heavily skewed (a tiny set of prefixes covers most traffic). Shard the trie by first character or prefix range across nodes so it fits in memory and load spreads, routing each request to the owning shard. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
How quickly can a brand-new trending query appear, and how?: A trie stores strings as a tree of characters, so finding all completions of a prefix means walking to the prefix node and collecting its subtree. Doing that subtree walk on every keystroke is too slow at scale, so the key optimisation is to precompute and cache the top-k suggestions directly at each node. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.

Last reviewed by the site editor: June 2026

Step 1 - Clarify the requirements

How many suggestions do we return, and ranked by what (popularity, personalisation)?
What latency budget per keystroke (typically under ~100 ms end-to-end)?
How fresh must suggestions be: do trending queries need to appear quickly?
Do we support typo tolerance and multi-language input?

Functional requirements

Return the top-k completions for a prefix as the user types.
Rank suggestions by popularity (and optionally personalisation).
Update the suggestion set from real query traffic.

Non-functional requirements

Very low latency per keystroke; the UX feels instant or not at all.
Scale to high query volume across many users.
Reasonable freshness so trending terms surface.

Step 2 - Back-of-the-envelope estimates

Sizing the system tells you which parts are hard. Round aggressively and state your assumptions out loud; the numbers matter less than showing you can reason about scale.

Metric	Estimate	Reasoning
Latency budget	< 100 ms per keystroke	Each character typed is a request; slow suggestions are worse than none.
Read:write ratio	massively read-heavy	Serving suggestions vastly outnumbers updating the index from logs.

Step 3 - Data model and API

A compact data model and a small API surface anchor the rest of the discussion. Keep both minimal; you can always extend them when the interviewer pushes.

Core entities

trie nodes

prefix -> top-k suggestions (precomputed) with scores

Cache top-k at each node so a query is a single lookup, not a subtree walk.

query_logs

query, count, last_seen

Aggregated source of popularity, fed by the build pipeline.

API sketch

GET/api/v1/suggest?q={prefix}- Return ranked completions for the prefix.

Step 4 - High-level design

Sketch the happy path end to end before optimising anything. This is the architecture you would draw on the whiteboard first:

1Build path: aggregate query logs offline to compute each term's popularity and assemble a trie.
2Precompute and store the top-k completions at every prefix node so serving is one lookup.
3Serve path: a request resolves the prefix and returns the cached top-k from an in-memory store.
4Periodically rebuild/merge the trie to fold in new and trending queries.

Step 5 - Deep dives that separate strong answers

The high-level design is table stakes. Interviewers spend most of the time here, probing the decisions that actually carry the system. These are the ones to be ready for.

Trie with precomputed top-k

Build pipeline and ranking

Serving at scale, sharding, and caching

Step 6 - Bottlenecks and how to scale past them

Naming where the design breaks, and the specific fix, is what signals seniority. For a search autocomplete system the pressure points are:

Subtree traversal per keystroke is too slow.

Precompute top-k at each trie node; serve is then a single lookup.

Trie too large for one node's memory.

Shard by prefix; keep shards in memory and route per request.

Trending terms lag the batch rebuild.

Add a streaming fast-path layer for recent/trending queries.

Step 7 - Key tradeoffs

There is rarely one right answer. State the tradeoff, then commit to a side with a reason tied to the requirements you clarified in step one.

Freshness vs build cost

Batch rebuild (cheap, lags traffic)

Streaming updates (fresh, complex)

Guidance: Batch for the bulk index; stream a small trending layer if real-time matters.

Storage vs latency

Walk subtree at query time (compact, slow)

Precompute top-k per node (fast, larger)

Guidance: Precompute: the latency requirement dominates the storage cost.

Common follow-up questions

When you finish the core design, expect the interviewer to pull on one of these threads. Have a one-paragraph answer ready for each.

How would you add typo tolerance (fuzzy matching)?: A trie stores strings as a tree of characters, so finding all completions of a prefix means walking to the prefix node and collecting its subtree. Doing that subtree walk on every keystroke is too slow at scale, so the key optimisation is to precompute and cache the top-k suggestions directly at each node. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
How do you personalise suggestions per user?: Popularity comes from real usage, so a batch (or near-real-time streaming) pipeline aggregates query logs into counts, optionally time-decayed so trending terms rise and stale ones fade. The pipeline rebuilds the trie (or updates it incrementally) with refreshed top-k lists. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
How do you handle multi-word and multi-language prefixes?: The serving tier keeps the trie (or its top-k tables) in memory for sub-millisecond lookups, fronted by an aggressive cache because prefixes are heavily skewed (a tiny set of prefixes covers most traffic). Shard the trie by first character or prefix range across nodes so it fits in memory and load spreads, routing each request to the owning shard. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
How quickly can a brand-new trending query appear, and how?: A trie stores strings as a tree of characters, so finding all completions of a prefix means walking to the prefix node and collecting its subtree. Doing that subtree walk on every keystroke is too slow at scale, so the key optimisation is to precompute and cache the top-k suggestions directly at each node. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.

Last reviewed by the site editor: June 2026

Step 1 - Clarify the requirements

Functional requirements

Non-functional requirements

Step 2 - Back-of-the-envelope estimates

Step 3 - Data model and API

Core entities

API sketch

Step 4 - High-level design

Step 5 - Deep dives that separate strong answers

Trie with precomputed top-k

Build pipeline and ranking

Serving at scale, sharding, and caching

Step 6 - Bottlenecks and how to scale past them

Step 7 - Key tradeoffs

Common follow-up questions

Keep practising

Design Search Autocomplete (Typeahead)

Step 1 - Clarify the requirements

Functional requirements

Non-functional requirements

Step 2 - Back-of-the-envelope estimates

Step 3 - Data model and API

Core entities

API sketch

Step 4 - High-level design

Step 5 - Deep dives that separate strong answers

Trie with precomputed top-k

Build pipeline and ranking

Serving at scale, sharding, and caching

Step 6 - Bottlenecks and how to scale past them

Step 7 - Key tradeoffs

Common follow-up questions

Keep practising