Step 1 — Clarify the requirements
Never start drawing boxes. A strong candidate spends the first few minutes scoping the problem so the design that follows is justified. For a chat system, the questions worth asking are:
- One-to-one only, or group chats too, and how large can groups get?
- Do we need delivery and read receipts, and online presence?
- Is message history persisted forever, or only until delivered?
- Do we need end-to-end encryption?
Functional requirements
- Send and receive one-to-one and group messages in real time.
- Deliver messages to offline recipients when they reconnect.
- Show delivery/read receipts and online presence.
Non-functional requirements
- Low end-to-end latency for delivery while both parties are online.
- Reliable delivery and consistent per-conversation ordering.
- Scale to hundreds of millions of concurrent long-lived connections.
Step 2 — Back-of-the-envelope estimates
Sizing the system tells you which parts are hard. Round aggressively and state your assumptions out loud; the numbers matter less than showing you can reason about scale.
| Metric | Estimate | Reasoning |
|---|---|---|
| Concurrent connections | tens of millions | Each online user holds a persistent connection, so connection count, not RPS, is the scaling axis. |
| Messages/day | tens of billions | Active messaging products move enormous message volume; storage and fan-out must keep up. |
Step 3 — Data model and API
A compact data model and a small API surface anchor the rest of the discussion. Keep both minimal; you can always extend them when the interviewer pushes.
Core entities
messages
message_id (sortable), conversation_id, sender_id, body, created_at, status
Partition by conversation_id; use a time-sortable id for ordering.
conversations
conversation_id (PK), type (1:1/group), member_ids, last_message_at
Membership list drives fan-out for group sends.
user_sessions
user_id -> connection server id
Routing table so a sender's server can find the recipient's gateway.
API sketch
- GET
ws://.../connect— Open a WebSocket; the server registers the user's session. - POST
/api/v1/messages— Send a message (also flows over the socket). - GET
/api/v1/conversations/{id}/messages— Load history, paginated by message id.
Step 4 — High-level design
Sketch the happy path end to end before optimising anything. This is the architecture you would draw on the whiteboard first:
- 1Clients hold a persistent WebSocket to a connection (chat) server through a load balancer.
- 2A presence/session service maps each online user to the server holding their connection.
- 3On send, the message is persisted, then routed to the recipient's connection server, which pushes it down the socket.
- 4If the recipient is offline, the message is stored and pushed (or pulled) on reconnect; a push notification nudges them.
Step 5 — Deep dives that separate strong answers
The high-level design is table stakes. Interviewers spend most of the time here, probing the decisions that actually carry the system. These are the ones to be ready for.
Real-time transport: WebSockets vs polling
HTTP request/response cannot push, so naive polling wastes resources and adds latency. Long polling is a stopgap. The right answer is a persistent bidirectional connection, normally a WebSocket (or a platform push channel on mobile). The server keeps the socket open and pushes messages as they arrive. This makes chat servers stateful: a given user's messages must route to the exact server holding their live connection, which is why you need a session registry mapping user to connection server.
Delivery, ordering, and offline messages
Each message gets a server-assigned, time-sortable id so a conversation has a single agreed order even if clients send concurrently. Persist the message before acknowledging the sender (so it survives a crash), then deliver. Track per-message status: sent, delivered (recipient's device received it), read (recipient opened it). For offline recipients, store undelivered messages and deliver them in order on reconnect; trigger a push notification through the notification service. Idempotent message ids let clients de-duplicate retried sends.
Group chat fan-out and presence
A group send looks up the member list and routes a copy to each online member's connection server, persisting once per recipient (or once with per-member delivery state). Very large groups become a fan-out problem similar to a feed and may warrant pull-based catch-up rather than pushing to thousands of sockets. Presence (online/last-seen) is high-churn and best kept in an in-memory store with a heartbeat; treat it as soft state that can be slightly stale rather than a strongly consistent fact.
Step 6 — Bottlenecks and how to scale past them
Naming where the design breaks, and the specific fix, is what signals seniority. For a chat system the pressure points are:
Connection servers run out of socket capacity.
Horizontally scale gateways; route by session registry; shed and reconnect gracefully.
Presence updates overwhelm the store.
Heartbeat with batching; keep presence in memory with short TTLs.
Large-group fan-out.
Switch big groups to pull-based catch-up; cap per-message push fan-out.
Step 7 — Key tradeoffs
There is rarely one right answer. State the tradeoff, then commit to a side with a reason tied to the requirements you clarified in step one.
Transport
WebSocket (true push, stateful)
Long polling (simpler, higher latency)
Guidance: WebSocket for real-time; long polling only as a fallback where sockets are blocked.
Group delivery
Push to every member (low latency)
Pull/catch-up (scales to huge groups)
Guidance: Push for small groups; pull for very large ones.
Common follow-up questions
When you finish the core design, expect the interviewer to pull on one of these threads. Have a one-paragraph answer ready for each.
- How would you add end-to-end encryption and what does it cost you (search, multi-device)?
- HTTP request/response cannot push, so naive polling wastes resources and adds latency. Long polling is a stopgap. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you sync history across a user's multiple devices?
- Each message gets a server-assigned, time-sortable id so a conversation has a single agreed order even if clients send concurrently. Persist the message before acknowledging the sender (so it survives a crash), then deliver. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you guarantee ordering when a client sends while offline?
- A group send looks up the member list and routes a copy to each online member's connection server, persisting once per recipient (or once with per-member delivery state). Very large groups become a fan-out problem similar to a feed and may warrant pull-based catch-up rather than pushing to thousands of sockets. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you handle a thundering herd when a popular server restarts?
- HTTP request/response cannot push, so naive polling wastes resources and adds latency. Long polling is a stopgap. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.