The four rounds you are preparing for
A machine learning engineer loop usually spans four kinds of round, and treating them as one blurry "ML interview" is how people underprepare. There is an ML fundamentals round on concepts and modelling judgement, a coding round much like a standard software interview, an ML system design round on building a model-serving system end to end, and often a round on deployment, monitoring, and the operational side. Strong candidates know which signal each round is after and prepare for them separately.
The role sits between data science and software engineering, so interviewers want both. You should be able to reason about bias and variance and also write clean, testable code and talk about latency budgets. Lean too far toward theory and you look like you have never shipped a model. Lean too far toward engineering and you look like you do not understand what you are deploying.
ML fundamentals, judgement over recall
The fundamentals round is not a vocabulary quiz. It checks whether you can choose the right approach for a problem and explain the tradeoffs. Expect questions on the bias variance tradeoff, overfitting and regularisation, how you would handle imbalanced classes, and when a simpler model beats a neural network.
The strongest answers connect a choice to consequences. If asked how you would handle a fraud dataset where positives are one in a thousand, do not just say "use SMOTE." Talk about why accuracy is a useless metric here, why you would look at precision, recall, and the precision-recall curve, how class weighting or resampling each have costs, and how the right operating point depends on the business cost of a false negative versus a false positive.
Be ready for "when would you not use a neural network." A good answer covers small datasets, the need for interpretability, tight latency or cost budgets, and cases where a gradient boosted tree on tabular data simply performs better with far less effort. Knowing when not to reach for the complex tool is a senior signal.
The coding round still matters
Many candidates neglect coding because they think the ML rounds carry the offer. They do not. A sloppy coding round can sink an otherwise strong loop. The format is usually standard data structures and algorithms, sometimes with a numerical or array-heavy flavour.
You may also be asked to implement a small piece of ML from scratch, which tests that you understand the maths underneath the libraries. Gradient descent is a common one.
def gradient_descent(x, y, lr=0.01, steps=1000):
w, b = 0.0, 0.0
n = len(x)
for _ in range(steps):
preds = [w * xi + b for xi in x]
dw = (2 / n) * sum((preds[i] - y[i]) * x[i] for i in range(n))
db = (2 / n) * sum(preds[i] - y[i] for i in range(n))
w -= lr * dw
b -= lr * db
return w, b
Be ready to explain each line, what the learning rate controls, and what happens if it is too large or too small. Treat this like any coding round: state your approach, talk through complexity, and check edge cases.
ML system design, the round that decides offers
This is where ML engineer offers are usually won or lost. You will be given a prompt like "design a system to recommend products" or "design fraud detection for checkout," and you need to reason from the data and the product backwards, not jump to a model.
Use a repeatable structure. Clarify the goal and how success is measured, since recommendations optimised for clicks behave very differently from recommendations optimised for long-term retention. Establish the scale and the latency budget. Then walk the pipeline: data sources and labels, features and a feature store, the model and a sensible baseline, training and evaluation, serving, and finally monitoring.
Two points separate strong answers. First, always propose a simple baseline before the fancy model, because interviewers want to see that you would not reach for a deep model when logistic regression would ship faster and be easier to debug. Second, be explicit about training and serving skew. If a feature is computed one way in training and another way at serving time, the model silently degrades, and naming that risk shows real experience.
Deployment, monitoring, and the operational side
A model that scores well offline can still fail in production, and interviewers increasingly probe this. Be ready to talk about how you would deploy a model safely and watch it once it is live.
- Roll out behind a shadow or canary so you can compare the new model against the current one on real traffic before switching over.
- Monitor input feature distributions for drift, because the world changes and a model trained last quarter may see different data today.
- Track the prediction distribution and the downstream business metric, not just request latency, since a model can keep serving fast while quietly getting worse.
- Have a rollback plan and a retraining cadence, and know what would trigger an emergency retrain.
If you can describe how you would detect that a live model has degraded, and what you would do about it, you stand out from candidates who only discuss offline accuracy.
Common ways candidates lose the loop
- Jumping to a complex model before stating the metric or proposing a baseline.
- Ignoring data and labels, which are usually the hard part, and over-indexing on architecture.
- Forgetting training and serving skew, then being unable to explain why the live model underperforms.
- Neglecting the coding round and assuming ML knowledge alone carries the offer.
How to practise
Run three or four full ML system design prompts out loud on a timer: a recommender, a fraud system, a search ranker, and a content moderation pipeline cover most of the patterns. Separately, drill standard coding problems and implement a few small ML pieces like gradient descent and a basic evaluation metric from scratch. After each design run, check that you stated the metric, proposed a baseline, named the training and serving skew, and described monitoring. That balance of theory and engineering is exactly what the role demands.
Continue your prep
Apply this against real role questions and templates: