Step 1 — Clarify the requirements
Never start drawing boxes. A strong candidate spends the first few minutes scoping the problem so the design that follows is justified. For a URL shortener, the questions worth asking are:
- How short does the alias need to be, and can users supply custom aliases?
- Do links expire, and can they be edited or deleted after creation?
- Do we need analytics (click counts, geography) on each link?
- What is the expected read-to-write ratio and traffic volume?
Functional requirements
- Create a short URL from a long URL, optionally with a custom alias.
- Redirect a short URL to its original long URL.
- Optionally support link expiry and basic click analytics.
Non-functional requirements
- Redirects must be low-latency (single-digit milliseconds) because every click pays the cost.
- High availability: a dead redirect breaks every link already shared.
- Short codes must be unique and effectively non-guessable enough to avoid trivial enumeration.
Step 2 — Back-of-the-envelope estimates
Sizing the system tells you which parts are hard. Round aggressively and state your assumptions out loud; the numbers matter less than showing you can reason about scale.
| Metric | Estimate | Reasoning |
|---|---|---|
| Read:write ratio | ~100:1 | Each created link is clicked many times; the system is read-dominated. |
| Writes | ~100 M new URLs / month | Assume 100 M creations monthly, which is ~40 writes/sec averaged out. |
| Reads | ~4 K redirects / sec | At 100:1 the redirect path handles roughly 4 K requests/sec, with hot keys far higher. |
| Storage | ~36 TB over 5 years | ~6 B URLs at ~500 bytes each (long URL + metadata) totals tens of terabytes. |
Step 3 — Data model and API
A compact data model and a small API surface anchor the rest of the discussion. Keep both minimal; you can always extend them when the interviewer pushes.
Core entities
links
short_code (PK), long_url, created_at, expires_at, owner_id, click_count
short_code is the partition key; the redirect path looks up by it.
users
user_id (PK), email, plan, created_at
Only needed if links are owned/managed.
API sketch
- POST
/api/v1/links— Create a short URL (body: long_url, optional alias). - GET
/{short_code}— Redirect (302/301) to the original long URL. - GET
/api/v1/links/{code}/stats— Return click analytics for a link.
Step 4 — High-level design
Sketch the happy path end to end before optimising anything. This is the architecture you would draw on the whiteboard first:
- 1Client hits an application server behind a load balancer.
- 2On create, generate a unique short code, persist the mapping, and return the short URL.
- 3On redirect, look up the code, return a 301/302 to the long URL, and asynchronously record the click.
- 4Front the read path with a cache because the same hot links are clicked repeatedly.
Step 5 — Deep dives that separate strong answers
The high-level design is table stakes. Interviewers spend most of the time here, probing the decisions that actually carry the system. These are the ones to be ready for.
Generating the short code
Three defensible strategies. (1) Base62-encode an auto-incrementing counter: simple and collision-free, but a single counter is a bottleneck and the codes are sequential and guessable. Distribute it with a ranged ID service (each app server claims a block of IDs). (2) Hash the long URL (e.g. MD5) and take the first N characters: deterministic but requires collision handling, and identical URLs collapse to one code, which may or may not be desired. (3) Pre-generate a pool of random unique codes in a key-generation service and hand them out: fast at write time and non-sequential, at the cost of running that service. State the length math: base62 with 7 characters yields 62^7 ≈ 3.5 trillion codes, comfortably enough.
Read path and caching
Because reads dominate, the redirect lookup should usually hit an in-memory cache (Redis/Memcached) keyed by short code, not the database. Use cache-aside: on a miss, read from the database and populate the cache with a TTL. The 80/20 rule applies hard here, so a relatively small cache absorbs most traffic. A CDN or edge function can shortcut the most popular links even closer to the user.
301 vs 302 redirect
A 301 (permanent) lets browsers and intermediaries cache the redirect, which cuts load on your servers but means you lose per-click analytics and can never change the destination. A 302 (temporary) forces every click through your servers, preserving analytics and editability at the cost of more traffic. The right choice depends on whether analytics matter; say so explicitly.
Step 6 — Bottlenecks and how to scale past them
Naming where the design breaks, and the specific fix, is what signals seniority. For a URL shortener the pressure points are:
Single ID counter becomes a write hotspot.
Use a distributed ID allocator that hands out ranges, or pre-generated random codes.
Redirect latency under hot-key load.
Cache aggressively (Redis + CDN); the access pattern is extremely cache-friendly.
Database size growth.
Shard by short_code with consistent hashing; archive expired links.
Step 7 — Key tradeoffs
There is rarely one right answer. State the tradeoff, then commit to a side with a reason tied to the requirements you clarified in step one.
Code generation
Counter + base62 (no collisions, sequential)
Random / hash (non-guessable, needs collision check)
Guidance: Use ranged counters for simplicity at scale; switch to random codes if guessability is a concern.
Datastore
SQL (simple, transactional)
NoSQL key-value (scales horizontally)
Guidance: The access pattern is a pure key lookup, so a wide-column or KV store fits, but SQL is fine until you outgrow one node.
Common follow-up questions
When you finish the core design, expect the interviewer to pull on one of these threads. Have a one-paragraph answer ready for each.
- How would you support custom aliases without breaking uniqueness?
- Three defensible strategies. (1) Base62-encode an auto-incrementing counter: simple and collision-free, but a single counter is a bottleneck and the codes are sequential and guessable. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you handle link expiry and cleanup at scale?
- Because reads dominate, the redirect lookup should usually hit an in-memory cache (Redis/Memcached) keyed by short code, not the database. Use cache-aside: on a miss, read from the database and populate the cache with a TTL. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How would you add real-time click analytics?
- A 301 (permanent) lets browsers and intermediaries cache the redirect, which cuts load on your servers but means you lose per-click analytics and can never change the destination. A 302 (temporary) forces every click through your servers, preserving analytics and editability at the cost of more traffic. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you prevent the service from being used to mask malicious URLs?
- Three defensible strategies. (1) Base62-encode an auto-incrementing counter: simple and collision-free, but a single counter is a bottleneck and the codes are sequential and guessable. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.