As asked
You need an LLM to extract invoice fields into JSON for downstream automation. How do you make the output reliable enough for production?
Sample answer outline
Define a strict schema first, including nullable fields, enums, currency handling, and confidence or evidence fields. Use model-native structured output or function calling where available, then validate every response with a real parser before accepting it. Add deterministic post-processing only for well-defined normalisation, not for fixing arbitrary model text. The pipeline should route low-confidence, invalid, or high-value invoices to human review and should store the original document and model version for audit. Strong candidates discuss evaluation by field-level precision and recall rather than a vague accuracy score.
Expect these follow-ups
- How do you handle a field that is visible in the invoice but ambiguous?
- What changes if the invoice may be handwritten?
- How do you version the schema without breaking downstream consumers?