As asked
You ran an A/B test on a new feature in Google Search. The experiment group shows a 0.5% increase in click-through rate with a p-value of 0.03. Your team wants to ship it. What questions do you ask before saying yes?
Sample answer outline
A strong answer goes beyond p-value: Was the sample size pre-determined or did the team peek and stop early (inflating false positives)? What is the practical significance of 0.5%? Is there novelty effect (did engagement spike because the feature is new)? What is the confidence interval, not just the point estimate? Are there segment breakdowns showing the lift is uniform, or does it only help one demographic? Did any guardrail metrics (latency, ad revenue, user satisfaction) move negatively?
Expect these follow-ups
- What would you do if the lift was consistent across all segments but the p-value was 0.08?
- How does the multiple comparisons problem affect this if you tested 10 variants simultaneously?