What the case study round is really for
The data scientist case study is where companies find out whether you can do the actual job, not just pass a stats quiz. A typical prompt is loose on purpose: "our subscription product has rising churn, what would you do," or "we want to test a new onboarding flow, design the experiment." The interviewer is watching whether you can turn a business problem into a precise question, choose the right method, reason about what could mislead you, and explain your thinking to someone who is not a statistician.
These rounds reward structure and judgement far more than a perfect model. Many strong technical candidates stumble here because they reach for a method before they have defined the problem. The fix is a repeatable approach you can lean on when the prompt is vague.
Frame the problem before touching methods
Spend the first few minutes converting the prompt into a sharp question. For "churn is rising," that means asking what churn means here, over what window, for which users, and what decision the analysis is meant to support. An analysis that informs a retention campaign looks different from one meant to forecast revenue.
Restate the goal out loud and check it. Something like "so we want to understand which users are likely to churn in the next thirty days and what is driving it, so the product team can intervene." That single sentence shows the interviewer you start from the decision, not the dataset. Then state your assumptions explicitly, because real problems always have gaps and naming them is a strength.
Reason about the data you would need
Before any modelling, talk through the data. What would you pull, what is the grain of each table, and what could be missing or biased. For churn you would want usage events, subscription status over time, support contacts, and billing history.
Watch for the traps that quietly break analyses.
- Survivorship bias, where you only look at users who stayed and miss the signal in those who left.
- Leakage, where a feature secretly encodes the outcome, such as "account closed" predicting churn.
- Selection effects, where the users in your sample are not representative of the population you care about.
Mentioning that you would check data quality, look at missingness, and validate the grain before modelling shows the discipline that separates someone who has done real analysis from someone who has only done tutorials.
Choose a method that fits the question
Only after framing the problem and understanding the data should you pick an approach, and you should justify it. If the goal is to understand drivers of churn for the product team, an interpretable model like logistic regression or a decision tree may be more useful than a black box that scores slightly better, because the team needs to know which levers to pull.
If the goal is pure prediction to prioritise outreach, a gradient boosted model is reasonable, but say how you would still surface feature importance so the result is actionable. The judgement signal is matching the method to the decision, and choosing the simpler tool when it serves the goal.
Be clear about how you would evaluate. For an imbalanced outcome like churn, accuracy is misleading. Talk about precision, recall, and the operating point that matches the cost of contacting a user who would have stayed versus missing one who leaves.
Experiment design questions
Many data scientist cases are really about experimentation. For "design a test for the new onboarding flow," structure your answer around the parts of a sound experiment.
Start with the hypothesis and the primary metric. State precisely what you expect to move and by how much, for example "the new flow increases seven day activation by two percentage points." Then talk about the unit of randomisation, usually the user, and why randomising at the wrong level, such as by session, would leak the treatment.
-- sanity check that assignment is balanced before trusting results
select variant, count(distinct user_id) as users
from experiment_assignment
where experiment = 'onboarding_v2'
group by variant;
Cover sample size and how long the test must run to detect the effect you care about, the guardrail metrics that catch harm, and the risks. Novelty effects where users react to anything new, the multiple comparisons problem if you check many metrics, and peeking at results early and stopping when they look good. Naming the peeking problem, and saying you would fix the duration in advance, is a strong signal.
Communicate like the analysis has a reader
A data scientist who cannot explain results to a product manager is only half doing the job, and interviewers test this directly. As you work, narrate your reasoning in plain language, and when you reach a conclusion, state it as a recommendation with a confidence level and a caveat.
Avoid drowning the answer in jargon. "The new flow lifted activation by about two points, and we are fairly confident it is real, though we should watch whether the effect holds past the first week" communicates more than a recitation of p-values. The ability to translate analysis into a decision is often what tips a case round into an offer.
Common mistakes to avoid
- Reaching for a model before defining the question and the decision it supports.
- Ignoring data quality, leakage, and bias, then trusting a result that is built on sand.
- Optimising a metric like accuracy that does not fit an imbalanced problem.
- Designing an experiment without stating the unit of randomisation or the stopping rule.
- Presenting findings as raw numbers rather than a clear, caveated recommendation.
How to practise
Run several full cases out loud on a timer, mixing the types: a churn diagnosis, a metric drop investigation, an experiment design, and a "how would you measure success" prompt. After each, check that you framed the question, reasoned about the data and its biases, justified your method, and ended with a clear recommendation a non-technical reader could act on. That combination of structure, statistical judgement, and clear communication is what data science case rounds are built to find.
Continue your prep
Apply this against real role questions and templates: