Start with the product, not the diagram
A Twitter clone is a canonical system design question because it looks simple and then exposes almost every backend tradeoff: fanout, hot users, timelines, storage, ranking, caching, rate limits, moderation and observability. The interview is rarely about inventing Twitter. It is about showing that you can move from product requirements to a defensible architecture.
A good first answer is not a diagram. It is a requirements negotiation:
- Users can post short text updates.
- Users can follow other users.
- Users can view a home timeline from followed accounts.
- Users can view a profile timeline for one account.
- Users can like, repost and reply.
- The system should handle high read traffic and uneven write traffic.
- Timeline freshness matters, but perfect ordering is not required for every user.
Then set explicit non-goals for a 45-minute interview: ads, direct messages, advanced recommendation ranking, full-text search and complex trust-and-safety workflows. You can mention that real systems need those, but do not let them swallow the design.
This format matters in 2026 because system design is appearing earlier in interview loops, not only at staff level. Candidate reports in ML and software communities describe more architecture questions, even for roles that used to focus mostly on coding. Public discussion from The Pragmatic Engineer on tech interviews, Hacker News interview-process threads, and ML interview discussions on r/MachineLearningJobs all point to the same trend: interviewers want production judgement.
Define the core model and API
Keep the model small. You need users, posts, follows and timelines. Everything else is optional.
type User = {
id: string;
handle: string;
displayName: string;
createdAt: string;
};
type Post = {
id: string;
authorId: string;
body: string;
createdAt: string;
replyToPostId?: string;
};
type Follow = {
followerId: string;
followeeId: string;
createdAt: string;
};
type TimelineItem = {
postId: string;
authorId: string;
createdAt: string;
score?: number;
};
The API can be boring. Boring is good in system design:
POST /v1/posts
GET /v1/users/{handle}/posts?cursor=...
POST /v1/users/{id}/follow
DELETE /v1/users/{id}/follow
GET /v1/timeline/home?cursor=...
GET /v1/posts/{id}/replies?cursor=...
In a real interview, state the consistency you need. Posting should be durable before the API returns success. The home timeline can be eventually consistent. Follows should take effect quickly, but it is acceptable if an old cached timeline persists briefly. Likes and repost counts can be eventually consistent.
This is where many candidates overcomplicate the answer. You do not need a graph database for a basic follow graph. A relational database or wide-column store can hold follows. The hard part is read fanout, not storing an edge.
Choose fanout on write, fanout on read, or a hybrid
The core Twitter design decision is timeline construction.
Fanout on read means you store posts once, and when a user opens their home timeline you fetch recent posts from all followed accounts, merge them and rank them. This works for users following a small number of accounts. It becomes expensive for users following thousands of accounts, and it makes every timeline request do a lot of work.
Fanout on write means that when Alice posts, you push a reference to that post into each follower's home timeline. Reads become cheap because the timeline is precomputed. Writes become expensive for authors with many followers.
Most strong answers choose a hybrid:
- Normal authors use fanout on write into follower timeline stores.
- Very large accounts use fanout on read or delayed fanout.
- The timeline service merges precomputed items with recent posts from large accounts.
- Caches absorb hot timelines and hot posts.
That hybrid is realistic because social graphs are skewed. A tiny fraction of accounts cause a huge fraction of fanout. The architecture should admit that instead of pretending every user has the same load.
const CELEBRITY_FOLLOWER_THRESHOLD = 1_000_000;
export function chooseFanoutStrategy(followerCount: number) {
if (followerCount >= CELEBRITY_FOLLOWER_THRESHOLD) {
return "fanout_on_read";
}
return "fanout_on_write";
}
That code is intentionally small. It shows the decision boundary. In production you would use config, experiments and load metrics rather than a hard-coded threshold.
For technical background, it is worth reading system-design material that focuses on real bottlenecks rather than trivia. HackerRank has argued for testing real-world development skills, and Hacker News discussions of real-codebase interviews show why interviewers increasingly probe tradeoffs after a candidate gives an initial design.
Storage and caching choices
A reasonable storage split:
- User service: relational database for user profiles and account metadata.
- Social graph service: relational or wide-column store keyed by follower and followee.
- Post service: durable store for posts, partitioned by author or time.
- Timeline service: key-value or wide-column store for per-user home timeline item IDs.
- Cache: Redis or Memcached for hot timelines, user profiles and post payloads.
- Search: separate index for text search, not in scope for the core timeline.
For timeline storage, you can store compact references rather than full posts:
type HomeTimelineRecord = {
userId: string;
postId: string;
authorId: string;
createdAtMs: number;
};
This lets you hydrate posts in batches:
export async function hydrateTimeline(
records: HomeTimelineRecord[],
loadPosts: (postIds: string[]) => Promise<Post[]>,
) {
const posts = await loadPosts(records.map((record) => record.postId));
const byId = new Map(posts.map((post) => [post.id, post]));
return records
.map((record) => byId.get(record.postId))
.filter((post): post is Post => Boolean(post));
}
Do not spend the whole interview naming databases. The important part is access pattern:
- "Give me the latest home timeline item IDs for user X."
- "Give me the latest posts by author Y."
- "Give me all followers of author Y in batches."
- "Check whether user A follows user B."
If you can explain the access pattern, the database choice becomes grounded.
Reliability, abuse and observability
A production social system is not only a timeline. It needs guardrails:
- Rate limit post creation, follows, likes and replies.
- Detect spammy follow bursts and duplicate content.
- Put fanout work on queues so posting does not block on every follower write.
- Retry failed fanout jobs with idempotency keys.
- Track lag between post creation and follower timeline visibility.
- Track cache hit rate, timeline p95 latency, queue depth and failed hydrations.
The queue is central:
type FanoutJob = {
jobId: string;
postId: string;
authorId: string;
followerBatchCursor?: string;
createdAt: string;
};
Make fanout idempotent by writing timeline records with a natural key such as (userId, postId). If a job retries, it should not duplicate posts in timelines.
Moderation is also worth naming. If a post is deleted or restricted, timeline hydration must respect that. You can either remove timeline references asynchronously or filter at read time. Filtering at read time is safer but adds read cost. Removing references is cleaner but can lag. A mature answer names that tradeoff.
How to present the design in an interview
Use this order:
- Clarify requirements and scale assumptions.
- Define core entities and APIs.
- Choose timeline strategy and explain fanout tradeoffs.
- Describe storage by access pattern.
- Add caching, queues and hot-user handling.
- Cover reliability, abuse and observability.
- State what you would improve with more time.
Avoid three common mistakes:
- Drawing microservices before requirements.
- Treating "use Kafka" as an explanation.
- Ignoring celebrity accounts and uneven traffic.
The strongest candidates sound pragmatic. They do not claim perfect freshness, perfect ordering and cheap writes at global scale. They make tradeoffs and then explain how they would measure whether those tradeoffs work.
Continue your prep
Practise the same structure on related backend loops: