Question 1

You have a 20GB CSV that you are loading into a pandas DataFrame. A groupby-agg operation takes 40 minutes. Walk me through the steps you would take to speed it up, from quick wins to structural changes.

Accepted Answer

Quick wins: downcast numeric dtypes (float64 to float32, int64 to int32 or int16), use categorical dtype for low-cardinality string columns, and profile with memory_profiler. Structural changes: switch to chunked processing with read_csv(chunksize=), use Dask or Polars for out-of-core computation, or push the aggregation into the database with SQL. The candidate should also mention Parquet as a better storage format than CSV.

Question 2

You are fitting an ARIMA model to weekly revenue data. What tests do you run to check for stationarity, and what transformations do you apply if the series is not stationary?

Accepted Answer

Use the Augmented Dickey-Fuller (ADF) test to check for unit roots (null: series has a unit root). If non-stationary, apply first-differencing for trend and log transformation for heteroskedastic variance. For seasonal series, apply seasonal differencing. The candidate should explain that ARIMA(p,d,q) sets d to the number of differencing operations needed. KPSS test can complement ADF for a dual confirmation.

Question 3

You arrive at work and the on-call alert says conversion rate dropped 20% overnight compared to the same time last week. Walk me through your investigation step by step.

Accepted Answer

The candidate should immediately check for data pipeline issues (instrumentation change, tracking bug) before assuming a real product issue. Then segment by platform, geography, device, and traffic source to isolate the affected population. Check if any experiments launched or code was deployed overnight. Compare the raw event counts to rule out a logging failure. Only after ruling out instrumentation errors does the candidate form product hypotheses.

Question 4

Your A/B test shows that the variant increased DAUs by 5% but revenue per user dropped 8%. The net effect on total revenue is negative. What do you recommend and how do you present this to the team?

Accepted Answer

The candidate should not ship when the business-level metric is negative. They should segment the revenue drop to find the mechanism (are new users lower value, or are existing users spending less?), check if the revenue impact is statistically significant, and investigate whether there is a recoverable variant that fixes the revenue issue without sacrificing engagement. Presentation should lead with the decision recommendation, then the supporting data.

Question 5

You need to test a new feature for your highest-value users, who represent only 0.5% of your user base. A full power calculation says you would need 3 years to reach significance. What do you do?

Accepted Answer

Options: (1) switch to a more sensitive metric (revenue per user rather than conversion rate) which has smaller required sample size for the effect size that matters; (2) accept a wider confidence interval and a larger MDE; (3) use Bayesian methods that give a useful posterior even with small samples; (4) use a quasi-experimental design with matched controls from similar users outside the segment; (5) run a qualitative user study as a complement. The candidate should not just say 'wait longer'.

Question 6

After deploying a loan approval model, you notice it has a 15% higher false negative rate for applicants from one demographic group than for others. The overall AUC is 0.87. What do you do?

Accepted Answer

The candidate should immediately flag this as a potential fairness and legal compliance issue, not just a modeling problem. Steps: quantify the disparity precisely with confidence intervals, investigate whether the group is underrepresented in training data, identify whether any features are proxies for the demographic group, and consult legal or compliance. Mitigation: resampling, reweighting, or calibrating thresholds per group. The candidate should not optimize global AUC at the expense of subgroup equity.

Question 7

After two days of an experiment that should have a 50-50 split, you see 60% of traffic in control and 40% in treatment. The treatment shows a significant lift. What do you do?

Accepted Answer

SRM invalidates the experiment. The candidate must not report the lift as real and must halt or investigate before concluding. Common causes: bot filtering applied differently per variant, caching that prevents some users from being assigned, redirect issues, or a bug in the assignment logic. The investigation involves digging into the assignment service logs and comparing event counts at each pipeline stage. The experiment results are not trustworthy until SRM is resolved.

Question 8

Your churn model relied heavily on 'days since last login' as a top feature. After a data engineering team refactored the pipeline, that feature's importance drops to near zero and model AUC falls by 4 points. How do you debug this?

Accepted Answer

The candidate should first check whether the feature values themselves changed in the new pipeline (compare distributions before and after the refactor using PSI or a simple histogram). Then verify the definition: did 'last login' start being computed differently (UTC vs local time, different event type counted as login)? Check for null rate changes. This is a classic training-serving skew scenario caused by an upstream definition change. The fix involves aligning the new feature definition or reverting the pipeline change.

Question 9

A marketing lead asks you to incorporate location data and browsing history from users who did not explicitly consent to their data being used for personalization. The data exists in your warehouse. What do you do?

Accepted Answer

The candidate should not proceed without verifying the legal basis for processing this data (GDPR, CCPA, or internal data governance policy). They should escalate to a data privacy or legal team rather than making the call themselves, and document the request. A strong answer also proposes privacy-preserving alternatives (differential privacy, aggregated signals, or consent-based opt-in campaigns) that achieve the business goal without the legal risk.

Question 10

You are presenting a 90-day revenue forecast to the executive team. Your model has wide confidence intervals because the business recently pivoted. How do you communicate the uncertainty without losing credibility or causing panic?

Accepted Answer

The candidate should present the point estimate alongside explicit confidence intervals, explain in plain language what drives the uncertainty (thin historical data on the new business model), offer scenario analysis (optimistic, base, pessimistic) tied to specific assumptions, and recommend a review cadence so the forecast is updated as new data arrives. Strong answers show that communicating uncertainty is a sign of rigor, not weakness, and frame it as enabling better decisions.

Question 11

You have a 20GB CSV that you are loading into a pandas DataFrame. A groupby-agg operation takes 40 minutes. Walk me through the steps you would take to speed it up, from quick wins to structural changes.

Accepted Answer

Quick wins: downcast numeric dtypes (float64 to float32, int64 to int32 or int16), use categorical dtype for low-cardinality string columns, and profile with memory_profiler. Structural changes: switch to chunked processing with read_csv(chunksize=), use Dask or Polars for out-of-core computation, or push the aggregation into the database with SQL. The candidate should also mention Parquet as a better storage format than CSV.

Question 12

You are fitting an ARIMA model to weekly revenue data. What tests do you run to check for stationarity, and what transformations do you apply if the series is not stationary?

Accepted Answer

Use the Augmented Dickey-Fuller (ADF) test to check for unit roots (null: series has a unit root). If non-stationary, apply first-differencing for trend and log transformation for heteroskedastic variance. For seasonal series, apply seasonal differencing. The candidate should explain that ARIMA(p,d,q) sets d to the number of differencing operations needed. KPSS test can complement ADF for a dual confirmation.

Question 13

You arrive at work and the on-call alert says conversion rate dropped 20% overnight compared to the same time last week. Walk me through your investigation step by step.

Accepted Answer

The candidate should immediately check for data pipeline issues (instrumentation change, tracking bug) before assuming a real product issue. Then segment by platform, geography, device, and traffic source to isolate the affected population. Check if any experiments launched or code was deployed overnight. Compare the raw event counts to rule out a logging failure. Only after ruling out instrumentation errors does the candidate form product hypotheses.

Question 14

Your A/B test shows that the variant increased DAUs by 5% but revenue per user dropped 8%. The net effect on total revenue is negative. What do you recommend and how do you present this to the team?

Accepted Answer

The candidate should not ship when the business-level metric is negative. They should segment the revenue drop to find the mechanism (are new users lower value, or are existing users spending less?), check if the revenue impact is statistically significant, and investigate whether there is a recoverable variant that fixes the revenue issue without sacrificing engagement. Presentation should lead with the decision recommendation, then the supporting data.

Question 15

You need to test a new feature for your highest-value users, who represent only 0.5% of your user base. A full power calculation says you would need 3 years to reach significance. What do you do?

Accepted Answer

Options: (1) switch to a more sensitive metric (revenue per user rather than conversion rate) which has smaller required sample size for the effect size that matters; (2) accept a wider confidence interval and a larger MDE; (3) use Bayesian methods that give a useful posterior even with small samples; (4) use a quasi-experimental design with matched controls from similar users outside the segment; (5) run a qualitative user study as a complement. The candidate should not just say 'wait longer'.

Question 16

After deploying a loan approval model, you notice it has a 15% higher false negative rate for applicants from one demographic group than for others. The overall AUC is 0.87. What do you do?

Accepted Answer

The candidate should immediately flag this as a potential fairness and legal compliance issue, not just a modeling problem. Steps: quantify the disparity precisely with confidence intervals, investigate whether the group is underrepresented in training data, identify whether any features are proxies for the demographic group, and consult legal or compliance. Mitigation: resampling, reweighting, or calibrating thresholds per group. The candidate should not optimize global AUC at the expense of subgroup equity.

Question 17

After two days of an experiment that should have a 50-50 split, you see 60% of traffic in control and 40% in treatment. The treatment shows a significant lift. What do you do?

Accepted Answer

SRM invalidates the experiment. The candidate must not report the lift as real and must halt or investigate before concluding. Common causes: bot filtering applied differently per variant, caching that prevents some users from being assigned, redirect issues, or a bug in the assignment logic. The investigation involves digging into the assignment service logs and comparing event counts at each pipeline stage. The experiment results are not trustworthy until SRM is resolved.

Question 18

Your churn model relied heavily on 'days since last login' as a top feature. After a data engineering team refactored the pipeline, that feature's importance drops to near zero and model AUC falls by 4 points. How do you debug this?

Accepted Answer

The candidate should first check whether the feature values themselves changed in the new pipeline (compare distributions before and after the refactor using PSI or a simple histogram). Then verify the definition: did 'last login' start being computed differently (UTC vs local time, different event type counted as login)? Check for null rate changes. This is a classic training-serving skew scenario caused by an upstream definition change. The fix involves aligning the new feature definition or reverting the pipeline change.

Question 19

A marketing lead asks you to incorporate location data and browsing history from users who did not explicitly consent to their data being used for personalization. The data exists in your warehouse. What do you do?

Accepted Answer

The candidate should not proceed without verifying the legal basis for processing this data (GDPR, CCPA, or internal data governance policy). They should escalate to a data privacy or legal team rather than making the call themselves, and document the request. A strong answer also proposes privacy-preserving alternatives (differential privacy, aggregated signals, or consent-based opt-in campaigns) that achieve the business goal without the legal risk.

Question 20

You are presenting a 90-day revenue forecast to the executive team. Your model has wide confidence intervals because the business recently pivoted. How do you communicate the uncertainty without losing credibility or causing panic?

Accepted Answer

The candidate should present the point estimate alongside explicit confidence intervals, explain in plain language what drives the uncertainty (thin historical data on the new business model), offer scenario analysis (optimistic, base, pessimistic) tied to specific assumptions, and recommend a review cadence so the forecast is updated as new data arrives. Strong answers show that communicating uncertainty is a sign of rigor, not weakness, and frame it as enabling better decisions.

Questions

Optimizing a slow pandas groupby on a 20GB DataFrameRole-specificmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Testing and achieving stationarity in time-series forecastingRole-specificmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Investigating a sudden 20% drop in conversion rateRole-specificmediumVery common

As asked

Sample answer outline

Expect these follow-ups

North-star metric up but revenue downRole-specifichardCommon

As asked

Sample answer outline

Expect these follow-ups

Running an experiment on a rare user segmentRole-specifichardCommon

As asked

Sample answer outline

Expect these follow-ups

Discovering a model that performs poorly on a demographic subgroupRole-specifichardCommon

As asked

Sample answer outline

Expect these follow-ups

What to do when an experiment has a sample ratio mismatchRole-specifichardCommon

As asked

Sample answer outline

Expect these follow-ups

A key feature loses importance after a data pipeline changeRole-specificmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Stakeholder asks to use sensitive user data in a modelRole-specificmediumOccasional

As asked

Sample answer outline

Expect these follow-ups

Communicating uncertainty in a forecast to executivesRole-specificmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Optimizing a slow pandas groupby on a 20GB DataFrameRole-specificmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Testing and achieving stationarity in time-series forecastingRole-specificmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Investigating a sudden 20% drop in conversion rateRole-specificmediumVery common

As asked

Sample answer outline

Expect these follow-ups

North-star metric up but revenue downRole-specifichardCommon

As asked

Sample answer outline

Expect these follow-ups

Running an experiment on a rare user segmentRole-specifichardCommon

As asked

Sample answer outline

Expect these follow-ups

Discovering a model that performs poorly on a demographic subgroupRole-specifichardCommon

As asked

Sample answer outline

Expect these follow-ups

What to do when an experiment has a sample ratio mismatchRole-specifichardCommon

As asked

Sample answer outline

Expect these follow-ups

A key feature loses importance after a data pipeline changeRole-specificmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Stakeholder asks to use sensitive user data in a modelRole-specificmediumOccasional

As asked

Sample answer outline

Expect these follow-ups

Communicating uncertainty in a forecast to executivesRole-specificmediumCommon

As asked

Sample answer outline