As asked
A team adds user_id and request_id labels to Prometheus metrics and the metrics backend starts falling over. How do you fix it without losing useful debugging signal?
Sample answer outline
Explain that each unique label set creates a time series, so unbounded labels like user_id, request_id, email, or URL path parameters can explode storage and query cost. Remove the high-cardinality labels from metrics and move per-request details to logs or traces, linked by trace_id. Keep bounded labels such as service, route template, status class, region, and dependency. Add instrumentation review, cardinality budgets, and alerts on series growth so this is caught before ingestion fails. Strong candidates preserve the debugging use case rather than just saying 'delete the label'.
Expect these follow-ups
- Which labels would you allow on an HTTP server duration histogram?
- How do exemplars help connect metrics to traces?
- What query patterns become dangerous after cardinality spikes?