Step 1 — Clarify the requirements
Never start drawing boxes. A strong candidate spends the first few minutes scoping the problem so the design that follows is justified. For a video streaming service, the questions worth asking are:
- Is this user-generated upload (YouTube) or a licensed catalogue (Netflix)?
- What scale of uploads and concurrent viewers?
- Which devices and network conditions must we support?
- Do we need recommendations, comments, and live streaming, or just VOD?
Functional requirements
- Upload a video and make it playable after processing.
- Stream video smoothly with adaptive quality based on bandwidth.
- Serve metadata (title, thumbnail, view count) and search.
Non-functional requirements
- Low startup latency and minimal rebuffering during playback.
- Global reach with consistent quality regardless of viewer location.
- Durable storage of source and derived video assets.
Step 2 — Back-of-the-envelope estimates
Sizing the system tells you which parts are hard. Round aggressively and state your assumptions out loud; the numbers matter less than showing you can reason about scale.
| Metric | Estimate | Reasoning |
|---|---|---|
| Read:write ratio | extremely read-heavy | Far more views than uploads; the watch path dominates and must be cheap per request. |
| Storage multiplier | ~5-8x source | Each video is stored in multiple resolutions and codecs, multiplying raw footage. |
| Bandwidth | petabytes/day at scale | Video bytes dwarf everything else; CDN offload is mandatory. |
Step 3 — Data model and API
A compact data model and a small API surface anchor the rest of the discussion. Keep both minimal; you can always extend them when the interviewer pushes.
Core entities
videos
video_id (PK), uploader_id, title, status, duration, created_at
Metadata; status tracks the transcoding pipeline.
renditions
video_id, resolution, codec, segment_manifest_url
One row per encoded variant; the manifest drives adaptive playback.
view_events
video_id, user_id, timestamp, watch_seconds
Append-only stream feeding view counts and recommendations.
API sketch
- POST
/api/v1/uploads— Initiate a resumable upload, return an upload URL. - GET
/api/v1/videos/{id}/manifest— Return the adaptive-bitrate manifest (HLS/DASH). - GET
/api/v1/videos/{id}— Fetch metadata for the watch page.
Step 4 — High-level design
Sketch the happy path end to end before optimising anything. This is the architecture you would draw on the whiteboard first:
- 1Upload: client uploads source to blob storage via a resumable URL; an event enqueues a transcoding job.
- 2Transcoding workers encode the source into multiple resolutions/codecs and segment them, writing manifests.
- 3Derived segments are pushed to a CDN; the video status flips to ready.
- 4Watch: the player fetches the manifest and pulls segments from the nearest CDN edge, switching bitrate adaptively.
Step 5 — Deep dives that separate strong answers
The high-level design is table stakes. Interviewers spend most of the time here, probing the decisions that actually carry the system. These are the ones to be ready for.
The transcoding pipeline
A raw upload is useless to most devices, so a pipeline transcodes it into a ladder of resolutions and codecs (e.g. 240p to 4K, H.264/H.265/AV1) and splits each into short segments. This is CPU-intensive and embarrassingly parallel, so it runs as asynchronous jobs on a worker fleet, often chunking a long video so segments encode in parallel and the start becomes watchable quickly. Model it as a DAG of stages (validate, split, encode, package, generate thumbnails) driven by a queue, with status tracked so the UI can show 'processing'. Failures retry per chunk rather than restarting the whole video.
Adaptive bitrate streaming and the CDN
Playback uses adaptive bitrate (HLS or DASH): the player downloads a manifest listing each rendition's segments, measures available bandwidth, and picks the highest quality that will not stall, switching mid-stream as conditions change. This is why startup is fast and quality scales with the connection. Segments are static files, so they are served from a CDN close to the viewer; the origin only handles cache misses and cold content. CDN offload is what makes per-view cost viable at petabyte scale, and content-hashed segment URLs make caching safe.
Metadata, view counts, and the read path
The watch page needs metadata (title, thumbnail, channel, view count) which lives in a regular database fronted by a cache, separate from the heavy video bytes. View counts are high-volume writes that do not need to be exact in real time, so they are typically aggregated through a stream-processing pipeline and updated approximately rather than incrementing a row per view. Recommendations and search run as their own services off the view-event stream. Keeping metadata, counts, and bytes in separate systems lets each scale on its own axis.
Step 6 — Bottlenecks and how to scale past them
Naming where the design breaks, and the specific fix, is what signals seniority. For a video streaming service the pressure points are:
Transcoding compute for long/high-res videos.
Chunk and parallelise encoding across a worker fleet; prioritise lower resolutions first.
Origin bandwidth for popular videos.
Aggressive CDN caching with origin shielding; pre-warm trending content.
View-count write storms.
Aggregate counts via a stream pipeline; show approximate counts.
Step 7 — Key tradeoffs
There is rarely one right answer. State the tradeoff, then commit to a side with a reason tied to the requirements you clarified in step one.
Streaming protocol
HLS (broad device support)
DASH (open standard, flexible)
Guidance: Both are adaptive-bitrate; choose by device reach and existing tooling.
Encode-on-upload vs on-demand
Pre-encode all renditions (fast playback, costly storage)
Encode popular renditions lazily (cheap storage, slower first play)
Guidance: Pre-encode common ladders; lazily encode rare ones for long-tail content.
Common follow-up questions
When you finish the core design, expect the interviewer to pull on one of these threads. Have a one-paragraph answer ready for each.
- How would you add live streaming, and how does it differ from VOD?
- A raw upload is useless to most devices, so a pipeline transcodes it into a ladder of resolutions and codecs (e.g. 240p to 4K, H.264/H.265/AV1) and splits each into short segments. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you implement resumable uploads for large files?
- Playback uses adaptive bitrate (HLS or DASH): the player downloads a manifest listing each rendition's segments, measures available bandwidth, and picks the highest quality that will not stall, switching mid-stream as conditions change. This is why startup is fast and quality scales with the connection. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How would you handle DRM and content protection?
- The watch page needs metadata (title, thumbnail, channel, view count) which lives in a regular database fronted by a cache, separate from the heavy video bytes. View counts are high-volume writes that do not need to be exact in real time, so they are typically aggregated through a stream-processing pipeline and updated approximately rather than incrementing a row per view. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.
- How do you keep view counts accurate enough without per-view DB writes?
- A raw upload is useless to most devices, so a pipeline transcodes it into a ladder of resolutions and codecs (e.g. 240p to 4K, H.264/H.265/AV1) and splits each into short segments. Sketch the change against the high-level design above and tie your choice back to the requirements you clarified, rather than reaching for the most complex option.