What SQL rounds are really checking
A SQL interview is rarely about exotic syntax. It checks whether you can read a schema, translate a fuzzy business question into a correct query, and reason about edge cases like nulls, duplicates, and ties. The interviewer often cares less about the perfect query and more about how you think through it: do you confirm the grain of each table, do you handle the rows that break naive logic, do you sanity check your result.
The questions cluster into a few areas. Joins and aggregation, window functions, date and cohort logic, and the open ended metric design questions where there is no single right answer. Prepare for all four, because analytics and data roles mix them heavily.
Get joins and grain right
Most SQL mistakes come from misunderstanding the grain of a table, that is, what one row represents. Before writing anything, say out loud what a row in each table means. If orders is one row per order and order_items is one row per line item, joining them and summing a revenue column from orders will multiply that revenue by the number of items. Catching that fanout before it happens is a strong signal.
Be fluent with the join types and when each is correct. A common task is finding rows in one table with no match in another, which is a classic left join with a null check.
select c.customer_id
from customers c
left join orders o on o.customer_id = c.customer_id
where o.customer_id is null;
Talk through why this works: the left join keeps every customer, and the null filter keeps only those with no matching order. Mentioning that you would verify the result count against a known total shows the habit interviewers want.
Master window functions
Window functions separate strong analytics candidates from the rest, and they appear constantly. The pattern that comes up most is "find the top N per group," which row_number handles cleanly.
select *
from (
select
user_id,
order_id,
total,
row_number() over (
partition by user_id
order by total desc
) as rn
from orders
) ranked
where rn = 1;
Be ready to explain the difference between row_number, rank, and dense_rank, because ties are a favourite follow-up. row_number breaks ties arbitrarily, rank leaves gaps after a tie, and dense_rank does not. Knowing which to use when a question says "include all rows tied for first" is exactly the kind of detail that gets probed.
Running totals and period-over-period comparisons also lean on windows. Practise sum() over (order by ...) for cumulative figures and lag() for comparing a row to the previous period.
Handle dates, cohorts, and retention
Analytics questions love time. You will be asked to bucket users into cohorts by signup month, then measure how many return in later months. The skill is computing the gap between two dates and grouping by it.
A retention query usually joins an activity table back to a cohort definition, computes the month difference, then counts distinct users per cohort per month offset. Walk through it in pieces rather than writing one dense query, and explain the grain at each step. Interviewers would rather see a clear three-step build than a clever one-liner they cannot follow.
Watch the edge cases: a user active twice in the same month should count once, so use count(distinct user_id). A user who churned and returned still counts as retained in the month they returned. Saying these out loud shows you understand the metric, not just the syntax.
The metric design questions
Some of the hardest analytics questions have no single correct query. "How would you measure whether a new feature is successful." "What metric would you not use to track retention." These test product sense and statistical judgement as much as SQL.
Structure your answer. First restate the goal of the feature in plain terms. Then propose a primary metric that maps directly to that goal, and a guardrail metric that catches harm. For a new comment feature, the primary metric might be the share of active users who comment, and the guardrail might be session length or report rate, so you notice if the feature is driving toxic behaviour.
Then say how you would compute it and what could mislead you. Survivorship bias, a metric that goes up simply because total usage went up, or a vanity metric like total comments that one power user can inflate. Naming a metric you would deliberately avoid, and why, is often the strongest part of the answer.
Reason about correctness and performance
Once your query works, interviewers may ask how you would trust it and how it performs. For correctness, describe how you would validate: check the row count against a known total, spot check a few users by hand, and confirm that totals reconcile with a simpler query.
For performance, you do not need to recite an execution plan, but you should know the basics. Filtering early, avoiding functions on indexed columns in a where clause, and selecting only the columns you need all help. If asked about a slow query, talk about indexes on the join and filter columns, and whether a pre-aggregated table would serve the dashboard better than recomputing every load.
Common mistakes to avoid
- Ignoring the grain and silently multiplying rows through a fanout join.
- Forgetting that
count(*)counts nulls differently fromcount(column). - Using
row_numberwhen the question wants ties included, or the reverse. - Answering a metric design question with a query before you have agreed what success means.
How to practise
Work through joins, then window functions, then date and cohort logic, doing several problems in each before moving on. For every query, state the grain first and the edge cases you considered. Then practise three or four metric design questions out loud, structuring each answer around the goal, the primary metric, the guardrail, and the trap you would avoid. That combination of clean SQL and clear product reasoning is what lands offers in analytics and data rounds.
Continue your prep
Apply this against real role questions and templates: