SQL and Analytics Interview Questions Explained

What SQL rounds are really checking

A SQL interview is rarely about exotic syntax. It checks whether you can read a schema, translate a fuzzy business question into a correct query, and reason about edge cases like nulls, duplicates, and ties. The interviewer often cares less about the perfect query and more about how you think through it. Do you confirm the grain of each table, do you handle the rows that break naive logic, do you sanity check your result before you call it done.

It helps to know what the interviewer is scoring underneath the question. Most SQL screens map to four signals: correctness, clarity of communication, handling of edge cases, and the ability to recover when the first attempt is wrong. You can pass a round with a query that needs one fix if you spot the fix yourself and talk through it calmly. You can fail a round with a query that runs but where you never questioned whether the numbers were right.

The questions cluster into a few areas. Joins and aggregation, window functions, date and cohort logic, and the open ended metric design questions where there is no single right answer. Prepare for all four, because analytics and data roles mix them heavily.

A reliable habit: before writing a single line, say out loud what one row of each table represents and what one row of your output should represent. Half of all SQL interview mistakes disappear the moment you do this.

What good versus weak looks like

Two candidates can write the same final query and leave very different impressions. The difference is almost entirely in the narration and the checks.

Behaviour	Weak signal	Strong signal
Reading the schema	Starts typing immediately	Confirms the grain of each table first
Joins	Joins and hopes the count is right	Predicts whether a join can fan out and why
Nulls and ties	Assumes clean data	Names the null and tie cases before being asked
First wrong answer	Gets flustered, rewrites blindly	Forms a hypothesis, tests it, fixes it
Finishing	Says "done"	Sanity checks the result against a known total

None of this requires memorising more syntax. It requires treating the query like production code that someone will trust.

Get joins and grain right

Most SQL mistakes come from misunderstanding the grain of a table, that is, what one row represents. Before writing anything, say out loud what a row in each table means. If orders is one row per order and order_items is one row per line item, joining them and summing a revenue column from orders will multiply that revenue by the number of items. Catching that fanout before it happens is a strong signal, and quietly producing inflated revenue is one of the most common ways candidates fail without realising.

Be fluent with the join types and when each is correct. A common task is finding rows in one table with no match in another, which is a classic left join with a null check.

SQL

select c.customer_id
from customers c
left join orders o on o.customer_id = c.customer_id
where o.customer_id is null;

Talk through why this works. The left join keeps every customer, and the null filter keeps only those with no matching order. Mentioning that you would verify the result count against a known total shows the habit interviewers want.

A frequent follow-up is to fix a fanout you have just created. Suppose you need total revenue per customer, but you have joined orders to order_items to get product detail. Summing orders.amount now double counts. The clean fix is to aggregate each table to a common grain before joining, rather than joining first and deduplicating after.

SQL

select
  o.customer_id,
  sum(o.amount) as revenue,
  count(distinct oi.product_id) as distinct_products
from orders o
left join order_items oi on oi.order_id = o.order_id
group by o.customer_id;

This still works only because sum(o.amount) would be wrong if a customer had many items, so in practice you would aggregate order_items in a subquery and join the two summaries. Saying that out loud, even if you write the simpler version under time pressure, signals that you understand the trap rather than having stumbled past it.

Master window functions

Window functions separate strong analytics candidates from the rest, and they appear constantly. The pattern that comes up most is "find the top N per group," which row_number handles cleanly.

SQL

select *
from (
  select
    user_id,
    order_id,
    total,
    row_number() over (
      partition by user_id
      order by total desc
    ) as rn
  from orders
) ranked
where rn = 1;

Be ready to explain the difference between row_number, rank, and dense_rank, because ties are a favourite follow-up. The quick reference below is worth having ready to recite.

Function	Behaviour on a tie	Use when
`row_number`	Assigns distinct numbers arbitrarily	You want exactly one row per group
`rank`	Same number for ties, then skips	You want competition ranking with gaps
`dense_rank`	Same number for ties, no gap	You want every row tied for first kept

So if the question says "include all rows tied for the highest total," row_number quietly drops the ties and gives a wrong answer. dense_rank() = 1 (or rank() = 1) keeps them. This is exactly the kind of detail that gets probed, and getting it right on the first pass is a strong differentiator.

Running totals and period-over-period comparisons also lean on windows. A typical task is month-on-month growth, which combines an aggregation with lag().

SQL

select
  month,
  revenue,
  revenue - lag(revenue) over (order by month) as mom_change,
  round(
    100.0 * (revenue - lag(revenue) over (order by month))
    / nullif(lag(revenue) over (order by month), 0),
    1
  ) as mom_pct
from monthly_revenue
order by month;

Two details earn credit here. The nullif(..., 0) guards against dividing by zero in the first comparable month, and ordering the window by month is what makes lag meaningful at all. If you forget the order by inside the window, the result is undefined, and an interviewer who knows the language will notice.

Handle dates, cohorts, and retention

Analytics questions love time. You will be asked to bucket users into cohorts by signup month, then measure how many return in later months. The skill is computing the gap between two dates and grouping by it.

Walk through it in pieces rather than writing one dense query, and explain the grain at each step. Interviewers would rather see a clear three-step build than a clever one-liner they cannot follow. A simple cohort retention shape looks like this.

SQL

with cohort as (
  select user_id, date_trunc('month', signup_date) as cohort_month
  from users
),
activity as (
  select distinct user_id, date_trunc('month', event_date) as active_month
  from events
)
select
  c.cohort_month,
  (extract(year from a.active_month) - extract(year from c.cohort_month)) * 12
    + (extract(month from a.active_month) - extract(month from c.cohort_month)) as month_offset,
  count(distinct a.user_id) as retained_users
from cohort c
join activity a on a.user_id = c.user_id
group by c.cohort_month, month_offset
order by c.cohort_month, month_offset;

Watch the edge cases, and name them before you are asked. A user active twice in the same month should count once, which is why activity uses distinct and the final count uses count(distinct user_id). A user who churned and returned still counts as retained in the month they returned, not penalised for the gap. Month offset zero is the cohort itself and should equal the cohort size, which is a free sanity check you can offer on the spot. Saying these out loud shows you understand the metric, not just the syntax.

The metric design questions

Some of the hardest analytics questions have no single correct query. "How would you measure whether a new feature is successful." "What metric would you not use to track retention." These test product sense and statistical judgement as much as SQL, and they are where senior candidates pull ahead.

Structure your answer. A reliable framework is goal, primary metric, guardrail, computation, and trap.

Restate the goal of the feature in plain terms, so you and the interviewer agree on what success means.
Propose a primary metric that maps directly to that goal, not to general activity.
Add a guardrail metric that catches harm the primary metric would hide.
Say how you would compute it, including the denominator and the time window.
Name a trap you would deliberately avoid, and why.

A worked example

Imagine the prompt is "you launched a comments feature on a content app. How would you measure if it worked?" A strong spoken answer runs roughly like this.

The goal is deeper engagement with content, not just more clicks. My primary metric is the share of weekly active users who post at least one comment, measured per week so I can see a trend rather than a single number. As a guardrail I would watch average session length and the report or block rate, because a comments feature can drive engagement up while quietly making the experience worse. I would compute the primary metric as distinct commenters divided by weekly active users, not total comments, because total comments can be inflated by a handful of power users. The trap I would avoid is treating raw comment volume as success. It is a vanity metric. I would also be careful that any uplift is not just novelty, so I would look at week two and week four retention of commenters, not only launch week.

That answer earns marks for naming a denominator, distinguishing a primary metric from a guardrail, and explicitly rejecting a tempting but misleading metric. Survivorship bias, a metric that rises only because total usage rose, and single-user inflation are the three traps worth keeping in your back pocket.

Reason about correctness and performance

Once your query works, interviewers may ask how you would trust it and how it performs. For correctness, describe how you would validate. Check the row count against a known total, spot check a few users by hand, and confirm that totals reconcile with a simpler query. A quick trick is to compute the same number two ways and assert they match. If a group by total does not equal the ungrouped total, something is wrong.

For performance, you do not need to recite an execution plan, but you should know the basics.

Filter early. Reduce rows before joining and aggregating, not after.
Avoid wrapping indexed columns in functions inside a where clause, since where date(created_at) = '2026-01-01' often cannot use an index, whereas a range on created_at can.
Select only the columns you need, especially before a sort or a window.
Consider pre-aggregation. If a dashboard recomputes a heavy query on every load, a summary table refreshed on a schedule usually serves it better.

If asked about a slow query, talk about indexes on the join and filter columns first, then whether the work can be moved off the hot path. That ordering, cheap structural fixes before architectural ones, is itself a signal of judgement.

How seniority and role change the bar

The same question is graded differently depending on the level and the role you are interviewing for.

Level or role	What raises the bar
Junior	Correct query, clear narration, handles obvious nulls
Mid	Predicts fanout, picks the right ranking function, validates the result
Senior	Drives the ambiguity, designs metrics, reasons about cost and trust
Analytics engineer	Thinks in models and grain, cares about reusable, testable SQL
Data scientist	Connects the query to a hypothesis and a decision, not just a number
Data engineer	Weighs performance, partitioning, and pipeline reliability

Tailor your emphasis. A data scientist who only writes flawless SQL but never ties it to a decision can come across as junior for the level, while a data engineer who designs an elegant metric but ignores a full table scan misreads the room.

Common mistakes to avoid

Ignoring the grain and silently multiplying rows through a fanout join.
Forgetting that count(*) counts every row while count(column) ignores nulls.
Using row_number when the question wants ties included, or the reverse.
Omitting order by inside a window where the result depends on order.
Dividing without a nullif guard and crashing on the first empty period.
Answering a metric design question with a query before you have agreed what success means.
Going silent for a long stretch, so the interviewer cannot follow or help you.

Frequently asked questions

Can I ask clarifying questions, or does that look weak? Ask them. A short, specific question about grain, time zone, or what counts as "active" reads as senior, not unsure. Rambling open ended questions do not.

What if I do not know the exact syntax? Say what you intend, write your best attempt, and note that you would confirm the function name. Interviewers care far more about the approach than about perfect recall of a function signature.

Should I optimise the query straight away? No. Get a correct, readable version first, confirm it is right, then discuss performance if asked. Premature optimisation that introduces a bug is worse than a clear query you can reason about.

Which dialect should I write in? Use standard ANSI style unless the company names a specific engine. If you know it is BigQuery, Snowflake, or Postgres, lean into that dialect and say so.

How to practise

Work through joins, then window functions, then date and cohort logic, doing several problems in each before moving on. For every query, state the grain first and the edge cases you considered, out loud, even when practising alone, because the interview is a spoken exercise as much as a written one. Then practise three or four metric design questions out loud, structuring each answer around the goal, the primary metric, the guardrail, the computation, and the trap you would avoid. That combination of clean SQL and clear product reasoning is what lands offers in analytics and data rounds.

Continue your prep

Apply this against real role questions and templates:

What SQL rounds are really checking

A reliable habit: before writing a single line, say out loud what one row of each table represents and what one row of your output should represent. Half of all SQL interview mistakes disappear the moment you do this.

What good versus weak looks like

Two candidates can write the same final query and leave very different impressions. The difference is almost entirely in the narration and the checks.

Behaviour	Weak signal	Strong signal
Reading the schema	Starts typing immediately	Confirms the grain of each table first
Joins	Joins and hopes the count is right	Predicts whether a join can fan out and why
Nulls and ties	Assumes clean data	Names the null and tie cases before being asked
First wrong answer	Gets flustered, rewrites blindly	Forms a hypothesis, tests it, fixes it
Finishing	Says "done"	Sanity checks the result against a known total

None of this requires memorising more syntax. It requires treating the query like production code that someone will trust.

Get joins and grain right

Be fluent with the join types and when each is correct. A common task is finding rows in one table with no match in another, which is a classic left join with a null check.

SQL

select c.customer_id
from customers c
left join orders o on o.customer_id = c.customer_id
where o.customer_id is null;

SQL

select
  o.customer_id,
  sum(o.amount) as revenue,
  count(distinct oi.product_id) as distinct_products
from orders o
left join order_items oi on oi.order_id = o.order_id
group by o.customer_id;

Master window functions

Window functions separate strong analytics candidates from the rest, and they appear constantly. The pattern that comes up most is "find the top N per group," which row_number handles cleanly.

SQL

select *
from (
  select
    user_id,
    order_id,
    total,
    row_number() over (
      partition by user_id
      order by total desc
    ) as rn
  from orders
) ranked
where rn = 1;

Be ready to explain the difference between row_number, rank, and dense_rank, because ties are a favourite follow-up. The quick reference below is worth having ready to recite.

Function	Behaviour on a tie	Use when
`row_number`	Assigns distinct numbers arbitrarily	You want exactly one row per group
`rank`	Same number for ties, then skips	You want competition ranking with gaps
`dense_rank`	Same number for ties, no gap	You want every row tied for first kept

Running totals and period-over-period comparisons also lean on windows. A typical task is month-on-month growth, which combines an aggregation with lag().

SQL

select
  month,
  revenue,
  revenue - lag(revenue) over (order by month) as mom_change,
  round(
    100.0 * (revenue - lag(revenue) over (order by month))
    / nullif(lag(revenue) over (order by month), 0),
    1
  ) as mom_pct
from monthly_revenue
order by month;

Handle dates, cohorts, and retention

SQL

with cohort as (
  select user_id, date_trunc('month', signup_date) as cohort_month
  from users
),
activity as (
  select distinct user_id, date_trunc('month', event_date) as active_month
  from events
)
select
  c.cohort_month,
  (extract(year from a.active_month) - extract(year from c.cohort_month)) * 12
    + (extract(month from a.active_month) - extract(month from c.cohort_month)) as month_offset,
  count(distinct a.user_id) as retained_users
from cohort c
join activity a on a.user_id = c.user_id
group by c.cohort_month, month_offset
order by c.cohort_month, month_offset;

The metric design questions

Structure your answer. A reliable framework is goal, primary metric, guardrail, computation, and trap.

Restate the goal of the feature in plain terms, so you and the interviewer agree on what success means.
Propose a primary metric that maps directly to that goal, not to general activity.
Add a guardrail metric that catches harm the primary metric would hide.
Say how you would compute it, including the denominator and the time window.
Name a trap you would deliberately avoid, and why.

A worked example

Imagine the prompt is "you launched a comments feature on a content app. How would you measure if it worked?" A strong spoken answer runs roughly like this.

The goal is deeper engagement with content, not just more clicks. My primary metric is the share of weekly active users who post at least one comment, measured per week so I can see a trend rather than a single number. As a guardrail I would watch average session length and the report or block rate, because a comments feature can drive engagement up while quietly making the experience worse. I would compute the primary metric as distinct commenters divided by weekly active users, not total comments, because total comments can be inflated by a handful of power users. The trap I would avoid is treating raw comment volume as success. It is a vanity metric. I would also be careful that any uplift is not just novelty, so I would look at week two and week four retention of commenters, not only launch week.

Reason about correctness and performance

For performance, you do not need to recite an execution plan, but you should know the basics.

Filter early. Reduce rows before joining and aggregating, not after.
Avoid wrapping indexed columns in functions inside a where clause, since where date(created_at) = '2026-01-01' often cannot use an index, whereas a range on created_at can.
Select only the columns you need, especially before a sort or a window.
Consider pre-aggregation. If a dashboard recomputes a heavy query on every load, a summary table refreshed on a schedule usually serves it better.

How seniority and role change the bar

The same question is graded differently depending on the level and the role you are interviewing for.

Level or role	What raises the bar
Junior	Correct query, clear narration, handles obvious nulls
Mid	Predicts fanout, picks the right ranking function, validates the result
Senior	Drives the ambiguity, designs metrics, reasons about cost and trust
Analytics engineer	Thinks in models and grain, cares about reusable, testable SQL
Data scientist	Connects the query to a hypothesis and a decision, not just a number
Data engineer	Weighs performance, partitioning, and pipeline reliability

Common mistakes to avoid

Ignoring the grain and silently multiplying rows through a fanout join.
Forgetting that count(*) counts every row while count(column) ignores nulls.
Using row_number when the question wants ties included, or the reverse.
Omitting order by inside a window where the result depends on order.
Dividing without a nullif guard and crashing on the first empty period.
Answering a metric design question with a query before you have agreed what success means.
Going silent for a long stretch, so the interviewer cannot follow or help you.

Frequently asked questions

Which dialect should I write in? Use standard ANSI style unless the company names a specific engine. If you know it is BigQuery, Snowflake, or Postgres, lean into that dialect and say so.

How to practise

Continue your prep

Apply this against real role questions and templates:

SQL and Analytics Interview Questions Explained

What SQL rounds are really checking

What good versus weak looks like

Get joins and grain right

Master window functions

Handle dates, cohorts, and retention

The metric design questions

A worked example

Reason about correctness and performance

How seniority and role change the bar

Common mistakes to avoid

Frequently asked questions

How to practise

Continue your prep

Continue your prep

Analytics engineer interview questions

Data scientist interview questions

Data engineer interview questions

SQL and Analytics Interview Questions Explained

What SQL rounds are really checking

What good versus weak looks like

Get joins and grain right

Master window functions

Handle dates, cohorts, and retention

The metric design questions

A worked example

Reason about correctness and performance

How seniority and role change the bar

Common mistakes to avoid

Frequently asked questions

How to practise

Continue your prep

Continue your prep

Analytics engineer interview questions

Data scientist interview questions

Data engineer interview questions