AI engineer machine learning interview questions

3 questions on machine learning for ai engineer candidates. Expect prompts such as “Design a grounded RAG answer flow” and “How does the bias-variance lens apply when you build on a foundation model?”, each with a worked answer outline and the follow-ups interviewers push on.

Showing 1 to 3 of 3 machine learning questions.

As asked

Design a RAG feature that answers customer support questions using internal documentation. How do you keep answers grounded and know when the model should refuse?

Sample answer outline

A good answer separates retrieval quality from generation quality. Chunk documents around semantic boundaries, attach metadata such as product, version, and freshness, and retrieve with hybrid search plus reranking for high precision. The generation prompt should require citations to retrieved passages and should refuse when evidence is missing or contradictory. Evaluation needs golden questions, adversarial no-answer cases, citation accuracy checks, and human review for high-impact categories. Candidates often focus on vector databases while ignoring stale docs, permissions, and the model's tendency to fill gaps.

Expect these follow-ups

How do you prevent a user from retrieving documents they cannot access?
What metric tells you retrieval is the bottleneck?
How do you handle two documents that disagree?

raggroundingevaluation

As asked

You are not training a model from scratch, you are building a feature on top of a foundation model with prompting, retrieval, and maybe light fine-tuning. How does bias-variance thinking still apply to the choices you make, and how do you tell whether your system is underfitting or overfitting the task?

Sample answer outline

Translate the concept into the applied-AI setting. Underfitting shows up as a system that is too generic: weak prompts, no retrieval, and a model that gives plausible but off-target answers because it lacks the task context. Overfitting shows up as a system tuned so tightly to a handful of examples that it fails on real, varied inputs: brittle few-shot prompts, a fine-tune on a tiny biased dataset, or retrieval that only works for the queries you tested. The applied controls are the modern analogues of regularisation and capacity: richer context and retrieval reduce bias, while a held-out evaluation set, diverse examples, and resisting over-tuning to a demo reduce variance. The AI-engineer signal is measuring this with an evaluation harness on representative inputs rather than eyeballing a few prompts, and knowing when a prompt change is genuinely better versus fitted to your test cases.

Expect these follow-ups

How do you build an evaluation set that reveals overfitting to your examples?
When does adding retrieval reduce bias versus just adding noise?
Why can a fine-tune on a small dataset make a model worse on real inputs?

fundamentalsapplied-aievaluationllm

As asked

You are upgrading the model behind a customer-facing assistant. What regression suite do you run before and after rollout?

Sample answer outline

Use a frozen eval set covering common tasks, edge cases, safety cases, no-answer cases, and historically bad production examples. Score with a mix of deterministic validators, human review for a stratified sample, and calibrated LLM judges for rubric checks. Compare not only aggregate score but slices by language, customer tier, task type, and retrieval source. Run shadow traffic before canarying, then watch online metrics such as escalation rate, thumbs-down rate, cost, latency, and refusal rate. The trap is declaring victory on average quality while a critical slice regresses.

Expect these follow-ups

How do you choose examples for the frozen eval set?
When should a single failing example block release?
How do you stop the suite becoming stale?

evalsregressionrelease

Tools to sharpen your prep

All tools

AI engineer machine learning interview questions

Showing 1 to 3 of 3 machine learning questions.

As asked

Design a RAG feature that answers customer support questions using internal documentation. How do you keep answers grounded and know when the model should refuse?

Sample answer outline

Expect these follow-ups

How do you prevent a user from retrieving documents they cannot access?
What metric tells you retrieval is the bottleneck?
How do you handle two documents that disagree?

raggroundingevaluation

As asked

Sample answer outline

Expect these follow-ups

How do you build an evaluation set that reveals overfitting to your examples?
When does adding retrieval reduce bias versus just adding noise?
Why can a fine-tune on a small dataset make a model worse on real inputs?

fundamentalsapplied-aievaluationllm

As asked

You are upgrading the model behind a customer-facing assistant. What regression suite do you run before and after rollout?

Sample answer outline

Expect these follow-ups

How do you choose examples for the frozen eval set?
When should a single failing example block release?
How do you stop the suite becoming stale?

evalsregressionrelease

Tools to sharpen your prep

All tools

AI engineer machine learning interview questions

As asked

Sample answer outline

Expect these follow-ups

As asked

Sample answer outline

Expect these follow-ups

As asked

Sample answer outline

Expect these follow-ups

Related questions

How does the bias-variance lens apply when you build on a foundation model?

Build a regression suite for LLM releases

Manage context window pressure in a prompt workflow

Design tool schemas for an agent SDK

More ai engineer topics

Tools to sharpen your prep

AI engineer machine learning interview questions

As asked

Sample answer outline

Expect these follow-ups

As asked

Sample answer outline

Expect these follow-ups

As asked

Sample answer outline

Expect these follow-ups

Related questions

How does the bias-variance lens apply when you build on a foundation model?

Build a regression suite for LLM releases

Manage context window pressure in a prompt workflow

Design tool schemas for an agent SDK

More ai engineer topics

Tools to sharpen your prep

Questions

Design a grounded RAG answer flowMachine learningmediumVery common

As asked

Sample answer outline

Expect these follow-ups

How does the bias-variance lens apply when you build on a foundation model?Machine learningmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Build a regression suite for LLM releasesMachine learningmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Related questions

How does the bias-variance lens apply when you build on a foundation model?

Build a regression suite for LLM releases

Manage context window pressure in a prompt workflow

Design tool schemas for an agent SDK

More ai engineer topics

Tools to sharpen your prep

Questions

Design a grounded RAG answer flowMachine learningmediumVery common

As asked

Sample answer outline

Expect these follow-ups

How does the bias-variance lens apply when you build on a foundation model?Machine learningmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Build a regression suite for LLM releasesMachine learningmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Related questions

How does the bias-variance lens apply when you build on a foundation model?

Build a regression suite for LLM releases

Manage context window pressure in a prompt workflow

Design tool schemas for an agent SDK

More ai engineer topics

Tools to sharpen your prep