Step 1 — Clarify the requirements
Never start drawing boxes. A strong candidate spends the first few minutes scoping the problem so the design that follows is justified. For a payment system, the questions worth asking are:
- Are we building a wallet, a card-charging gateway, or a marketplace payout system?
- Which payment methods and currencies must we support?
- What are the consistency and auditing requirements (regulatory)?
- Do we hold balances ourselves, or only orchestrate external providers?
Functional requirements
- Charge a customer and record the result reliably.
- Maintain accurate balances/ledger entries for every party.
- Handle refunds, retries, and settlement with providers.
Non-functional requirements
- Correctness above all: no double-charge, no lost money, full auditability.
- Strong consistency on balances; eventual is unacceptable for money.
- Idempotent operations and complete reconciliation.
Step 2 — Back-of-the-envelope estimates
Sizing the system tells you which parts are hard. Round aggressively and state your assumptions out loud; the numbers matter less than showing you can reason about scale.
| Metric | Estimate | Reasoning |
|---|---|---|
| Throughput | moderate, correctness-bound | Payments are not usually web-scale RPS; the hard constraint is exactness, not volume. |
| Durability | 100% (no loss tolerated) | Every transaction must be durably recorded before acknowledging. |
Step 3 — Data model and API
A compact data model and a small API surface anchor the rest of the discussion. Keep both minimal; you can always extend them when the interviewer pushes.
Core entities
payments
payment_id, idempotency_key, amount, currency, status, provider_ref, created_at
idempotency_key makes retries safe; status is a strict state machine.
ledger_entries
entry_id, account_id, debit, credit, transaction_id, created_at
Append-only double-entry ledger; sum of debits equals sum of credits.
accounts
account_id, owner, balance, currency
Balance is derived from / reconciled against the ledger.
API sketch
- POST
/api/v1/payments— Create a charge (requires an Idempotency-Key header). - POST
/api/v1/refunds— Refund a prior payment (also idempotent). - GET
/api/v1/payments/{id}— Fetch payment status.
Step 4 — High-level design
Sketch the happy path end to end before optimising anything. This is the architecture you would draw on the whiteboard first:
- 1A payment request carries an idempotency key; the service records intent before doing anything external.
- 2It calls the external provider; on success it writes double-entry ledger rows in a transaction.
- 3Asynchronous settlement and webhooks update final status; everything is reconciled against the provider daily.
- 4A strict payment state machine plus retries-with-idempotency makes the whole flow safe to repeat.
Step 5 — Deep dives that separate strong answers
The high-level design is table stakes. Interviewers spend most of the time here, probing the decisions that actually carry the system. These are the ones to be ready for.
Idempotency and exactly-once charging
Networks fail mid-request, so clients retry, and a naive design would charge twice. The fix is an idempotency key supplied by the client and stored with the first attempt: a retry with the same key returns the original result instead of charging again. Combine this with a durable record of intent written before the external call, so even a crash between charging the provider and recording the result can be reconciled rather than lost. 'Exactly-once' in payments is really 'at-least-once attempts plus idempotent processing', and saying that precisely is what an interviewer wants to hear.
The double-entry ledger
Money is tracked in an append-only ledger using double-entry accounting: every transaction writes balanced debit and credit entries, so the books always sum to zero and nothing is created or destroyed silently. Balances are derived from (and continuously reconciled with) the ledger rather than being a single mutable number you increment, because an immutable, auditable history is both a correctness tool and a regulatory requirement. This structure makes errors detectable (the ledger stops balancing) and gives you a complete audit trail by construction.
Coordinating failure: sagas and reconciliation
A payment touches multiple systems (your ledger, the provider, possibly a wallet and a payout) that cannot share one ACID transaction. Coordinate them with a saga: a sequence of steps each with a compensating action (if a later step fails, undo the earlier ones, e.g. refund a captured charge). Drive it from a durable state machine so a crashed flow resumes correctly. Finally, reconciliation is non-negotiable: a scheduled job compares your records against the provider's settlement reports to catch any discrepancy, because in money you assume something will eventually drift and you must detect it.
Step 6 — Bottlenecks and how to scale past them
Naming where the design breaks, and the specific fix, is what signals seniority. For a payment system the pressure points are:
Duplicate charges from retries.
Client idempotency keys persisted with the first attempt.
Partial failure across provider and ledger.
Saga with compensating actions plus a durable state machine.
Silent drift from the provider's records.
Scheduled reconciliation against settlement reports.
Step 7 — Key tradeoffs
There is rarely one right answer. State the tradeoff, then commit to a side with a reason tied to the requirements you clarified in step one.
Consistency
Strong, transactional ledger (correct, slower)
Eventual (faster, unacceptable for balances)
Guidance: Money demands strong consistency on the ledger; never trade it for latency.
Coordination
Distributed transaction / 2PC (strict, fragile)
Saga + compensation (resilient, eventual within the flow)
Guidance: Sagas in practice; 2PC is brittle across external providers.
Common follow-up questions
When you finish the core design, expect the interviewer to pull on one of these threads. Have a one-paragraph answer ready for each.
- How exactly does an idempotency key prevent a double charge end to end?
- Networks fail mid-request, so clients retry, and a naive design would charge twice. The fix is an idempotency key supplied by the client and stored with the first attempt: a retry with the same key returns the original result instead of charging again. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How would you handle multi-currency balances and FX?
- Money is tracked in an append-only ledger using double-entry accounting: every transaction writes balanced debit and credit entries, so the books always sum to zero and nothing is created or destroyed silently. Balances are derived from (and continuously reconciled with) the ledger rather than being a single mutable number you increment, because an immutable, auditable history is both a correctness tool and a regulatory requirement. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you process refunds and chargebacks safely?
- A payment touches multiple systems (your ledger, the provider, possibly a wallet and a payout) that cannot share one ACID transaction. Coordinate them with a saga: a sequence of steps each with a compensating action (if a later step fails, undo the earlier ones, e.g. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you scale the ledger while keeping it auditable?
- Networks fail mid-request, so clients retry, and a naive design would charge twice. The fix is an idempotency key supplied by the client and stored with the first attempt: a retry with the same key returns the original result instead of charging again. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.