As asked
A data scientist gives you a gradient boosting model that achieves great offline AUC but you notice inference is 200ms per sample. Explain how gradient boosting builds an ensemble compared to random forests, which architectural properties drive that latency, and what MLOps levers you have to bring it within a 10ms serving SLA.
Sample answer outline
A strong answer explains that random forests build trees independently in parallel on bootstrapped samples and averages predictions (bagging), while gradient boosting builds trees sequentially, each fitting the residuals of the previous ensemble (boosting). Gradient boosting is more prone to overfitting because each tree corrects errors including noise; the key regularization knobs are learning rate (shrinkage), max depth (tree complexity), min samples per leaf, subsampling rate, and number of estimators.
Expect these follow-ups
- What does the learning rate multiply in the gradient boosting update equation?
- When would you choose XGBoost over LightGBM for a feature with 10 million distinct values?