Step 1 — Clarify the requirements
Never start drawing boxes. A strong candidate spends the first few minutes scoping the problem so the design that follows is justified. For a notification system, the questions worth asking are:
- Which channels: push, SMS, email, in-app, or all of them?
- What volume and what is the latency expectation (transactional vs marketing)?
- Do users have preferences and quiet hours we must honour?
- Do we need delivery tracking and analytics?
Functional requirements
- Accept notification requests from many internal services.
- Deliver across push, SMS, email, and in-app channels.
- Respect per-user preferences, opt-outs, and rate limits.
Non-functional requirements
- Reliable, at-least-once delivery with retries and dead-letter handling.
- High throughput with bursty load (e.g. a product launch blast).
- No duplicate notifications for the same logical event.
Step 2 — Back-of-the-envelope estimates
Sizing the system tells you which parts are hard. Round aggressively and state your assumptions out loud; the numbers matter less than showing you can reason about scale.
| Metric | Estimate | Reasoning |
|---|---|---|
| Peak send rate | bursty, 100x average | Marketing blasts and incident alerts create spikes; the pipeline must buffer them. |
| Provider latency | 100s of ms, variable | Third-party APNs/FCM/SMS/email gateways are slow and occasionally fail, so calls must be async. |
Step 3 — Data model and API
A compact data model and a small API surface anchor the rest of the discussion. Keep both minimal; you can always extend them when the interviewer pushes.
Core entities
notifications
notification_id, user_id, template_id, channel, status, dedup_key, created_at
dedup_key prevents duplicates for the same logical event.
preferences
user_id, channel, enabled, quiet_hours, frequency_cap
Checked before queuing a send.
device_tokens
user_id, platform, token, updated_at
Push tokens per device; prune invalid ones on provider feedback.
API sketch
- POST
/api/v1/notify— Internal services enqueue a notification (idempotent via dedup_key). - PUT
/api/v1/preferences— Update a user's channel preferences. - GET
/api/v1/notifications/{id}— Check delivery status.
Step 4 — High-level design
Sketch the happy path end to end before optimising anything. This is the architecture you would draw on the whiteboard first:
- 1A notification service validates the request, applies preferences and rate limits, then enqueues per-channel jobs.
- 2Channel workers pull from queues and call the relevant provider (APNs/FCM, an SMS gateway, an email provider).
- 3Failures are retried with backoff; permanent failures go to a dead-letter queue for inspection.
- 4Delivery callbacks update status and feed analytics.
Step 5 — Deep dives that separate strong answers
The high-level design is table stakes. Interviewers spend most of the time here, probing the decisions that actually carry the system. These are the ones to be ready for.
Queue-based fan-out and reliability
Decouple the request from delivery with a durable message queue per channel. The intake service is fast and just enqueues; slow, flaky provider calls happen in workers that can scale independently. Use at-least-once delivery with idempotent processing keyed on the dedup_key, so a retried job never sends twice. Apply exponential backoff with jitter on retries, cap the attempts, and route exhausted messages to a dead-letter queue rather than silently dropping or looping forever. This is the structural reason notifications are reliable despite unreliable downstreams.
Deduplication and rate limiting
Two events can describe the same notification (a retry from the caller, a duplicate trigger). Store a dedup_key (event id plus user plus channel) and reject or coalesce repeats within a window. Separately, protect users from spam and providers from overload with frequency caps per user and global rate limits per provider; a notification the user has effectively already received should be suppressed. Honour quiet hours and per-channel opt-outs before anything is queued, not after.
Channel-specific concerns
Each channel has its own provider, payload format, and failure modes. Push requires valid device tokens that expire and must be pruned on provider feedback; SMS has strict length and cost considerations and country-specific routing; email needs templating, unsubscribe links, and deliverability/reputation management. Abstract these behind a common worker interface but keep channel adapters separate so one provider outage degrades only its channel, and design templates so the same logical notification renders correctly per channel.
Step 6 — Bottlenecks and how to scale past them
Naming where the design breaks, and the specific fix, is what signals seniority. For a notification system the pressure points are:
Bursty load (launch blasts) overwhelms providers.
Buffer in queues and drain at a provider-safe rate; auto-scale workers.
Provider outage stalls a channel.
Per-channel isolation, retries with backoff, dead-letter queue, and failover providers.
Duplicate sends.
Idempotency via dedup_key plus a short-window suppression check.
Step 7 — Key tradeoffs
There is rarely one right answer. State the tradeoff, then commit to a side with a reason tied to the requirements you clarified in step one.
Delivery guarantee
At-least-once (reliable, needs dedup)
At-most-once (no dupes, may drop)
Guidance: At-least-once with idempotent consumers is the standard; never silently drop transactional alerts.
Intake coupling
Synchronous send (simple, fragile)
Queue then async send (resilient)
Guidance: Always queue: provider latency and failures must not block callers.
Common follow-up questions
When you finish the core design, expect the interviewer to pull on one of these threads. Have a one-paragraph answer ready for each.
- How do you prioritise transactional alerts over marketing blasts in the same pipeline?
- Decouple the request from delivery with a durable message queue per channel. The intake service is fast and just enqueues; slow, flaky provider calls happen in workers that can scale independently. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How would you implement digest notifications (batch many into one)?
- Two events can describe the same notification (a retry from the caller, a duplicate trigger). Store a dedup_key (event id plus user plus channel) and reject or coalesce repeats within a window. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you track delivery and click-through across channels?
- Each channel has its own provider, payload format, and failure modes. Push requires valid device tokens that expire and must be pruned on provider feedback; SMS has strict length and cost considerations and country-specific routing; email needs templating, unsubscribe links, and deliverability/reputation management. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you handle a provider that silently drops messages?
- Decouple the request from delivery with a durable message queue per channel. The intake service is fast and just enqueues; slow, flaky provider calls happen in workers that can scale independently. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.