Agent development is becoming a job-shaped skill
"Agent SDK developer" is still more skill phrase than stable job title, but the work is real. Companies are hiring engineers to build agentic workflows: tools, planners, memory, computer-use interfaces, approval gates, evals and integrations with business systems.
The research file points to OpenAI and Anthropic roles around agents, tool use and infrastructure, plus public agent reports. See OpenAI careers, Anthropic jobs, Anthropic's 2026 State of AI Agents Report, and background on coding agents such as Codex.
In interviews, the question is not "can you call a model API?" It is "can you build a system that gives a model useful tools without letting it cause uncontrolled damage?"
Know the basic agent loop
A simple agent loop has:
- Goal or task.
- Model reasoning step.
- Tool selection.
- Tool execution.
- Observation.
- Repeat or finish.
In production, add:
- Permissions.
- Human approval.
- Timeouts.
- Budget limits.
- Audit logs.
- Evals.
- Rollback.
Minimal TypeScript shape:
type ToolCall = {
name: string;
input: Record<string, unknown>;
};
type ToolResult = {
ok: boolean;
output: string;
};
type Tool = {
name: string;
description: string;
run: (input: Record<string, unknown>) => Promise<ToolResult>;
};
export async function runToolCall(
tools: Tool[],
call: ToolCall,
): Promise<ToolResult> {
const tool = tools.find((candidate) => candidate.name === call.name);
if (!tool) {
return { ok: false, output: `Unknown tool: ${call.name}` };
}
return tool.run(call.input);
}
An interview will push beyond this. What if the tool deletes data? What if the model loops forever? What if the observation contains prompt injection? What if the user did not authorise the action?
Design tools like APIs, not magic powers
Good agent tools are narrow, typed and auditable. Bad tools are broad wrappers around arbitrary system access.
Better:
searchTickets({ query, limit })draftReply({ ticketId, tone })createRefundRequest({ orderId, amount, reason })
Riskier:
runSql({ query })executeShell({ command })sendEmail({ to, subject, body })without approval
Tool design should include:
- Input schema.
- Auth context.
- Permission check.
- Rate limit.
- Dry-run mode for risky actions.
- Human approval for irreversible actions.
- Structured result.
import { z } from "zod";
const RefundRequestInput = z.object({
orderId: z.string().min(1),
amountPence: z.number().int().positive(),
reason: z.string().min(10),
});
export function parseRefundRequest(input: unknown) {
return RefundRequestInput.parse(input);
}
If you cannot validate tool input, you do not have a production-grade agent.
Expect eval and safety questions
Agent interviews often test failure thinking:
- How do you stop an agent from taking destructive actions?
- How do you evaluate a multi-step task?
- What should be logged?
- How do you handle prompt injection in retrieved content?
- How do you recover from partial completion?
- How do you cap cost and runtime?
Answer with controls:
- Allowlist tools.
- Use scoped credentials.
- Require confirmation for external side effects.
- Record every action and observation.
- Use task budgets.
- Run evals with adversarial cases.
- Make operations idempotent.
AI safety and governance are relevant even for ordinary business agents. OpenAI's safety page and frontier governance framework are broader than a normal product interview, but they show the direction: capability growth requires structured controls.
Prepare a concrete agent project
A strong portfolio agent is not a toy that can do anything. It is a narrow workflow that does one job well.
Good examples:
- Triage support tickets and draft replies with approval.
- Review pull requests for one class of issue and leave draft comments.
- Convert meeting notes into tracked tasks with human confirmation.
- Search internal docs and create a cited answer.
- Reconcile failed imports and propose fixes.
Include:
- Tool list.
- Permissions model.
- Human approval point.
- Eval cases.
- Logs or traces.
- Known failure modes.
README section:
## Safety model
- The agent can read tickets and draft replies.
- It cannot send replies without human approval.
- Refunds are created as pending requests, not executed directly.
- Every tool call is logged with task ID and user ID.
That is the kind of detail interviewers remember.
How to talk through a system-design round
When the interviewer opens the design portion, resist the urge to start with the model. Start with the boundary. Sketch what the agent is allowed to touch, who the principal is on each call, and where a human sits in the loop. A clear answer names three layers: the orchestration layer that runs the loop and enforces budgets, the tool layer that exposes narrow typed capabilities, and the policy layer that decides whether a given call is permitted for this user in this context. Most weak answers collapse all three into one prompt, which is exactly the design that fails review.
Then walk the unhappy paths out loud. Describe what happens when a tool times out, when the model emits malformed arguments, when a retrieved document tries to hijack the instructions, and when a long task is interrupted halfway. Strong candidates treat these as first-class states with explicit handling: retries with backoff, schema validation that rejects bad calls before execution, content sandboxing that strips instructions from untrusted data, and a checkpoint so a resumed task does not repeat side effects. If you can also say how you would measure regressions, with a small suite of scripted tasks scored on success and safety, you signal that you build agents you can actually operate rather than demo once and abandon.
Continue your prep
Agent roles sit closest to AI engineering: