As asked
What telemetry would you add to an LLM application so on-call can debug latency spikes, cost overruns, and quality regressions?
Sample answer outline
Capture request count, error rate, provider status, queue time, time to first token, total latency, input tokens, output tokens, cache hit rate, and estimated cost by route, tenant, model, and prompt version. Quality telemetry should include eval score distributions, user feedback, refusal rate, tool-call failure rate, and structured-output validation failures. Use traces to connect retrieval, model call, tool calls, and post-processing into one request path. Watch cardinality carefully because raw prompt text and user ids do not belong in metric labels. Candidates often provide infrastructure metrics but miss product-level quality signals.
Expect these follow-ups
- Which labels would you forbid on metrics?
- How do you correlate a prompt change with a quality regression?
- What alert would page someone versus create a ticket?