As asked
Your LLM spend triples overnight after a product release, but traffic only grew 10 percent. How do you investigate and contain it?
Sample answer outline
Segment cost by route, tenant, model, prompt version, and input versus output tokens to find the regression. Common causes include prompt expansion, retrieval returning too many chunks, a loop in tool calling, lost cache hits, or a fallback to a more expensive model. Immediate containment can be budget caps, prompt rollback, max-token limits, cache restoration, or disabling the offending feature flag. The permanent fix is cost tests in CI, per-route budgets, and dashboards that show token deltas on deploy. Strong answers avoid simply lowering max tokens if it breaks task quality.
Expect these follow-ups
- What cost guardrail would you enforce before launch?
- How do you distinguish abusive usage from a product bug?
- Who should be allowed to override a budget cap?