As asked
You are onboarding a backend engineer who has never used Datadog before. Explain the difference between metrics, distributed traces, and logs, and give a concrete example of when you would reach for each one to debug a latency problem in a microservices API.
Sample answer outline
A strong answer explains that metrics are numeric aggregations good for alerting on trends, traces capture the causal chain across service boundaries to find which hop is slow, and logs carry the raw event detail for understanding why a specific request failed. The debugging example should show the three being used in sequence: alert from a p99 latency metric, trace to find the slow downstream call, log on that service to read the specific error.
Expect these follow-ups
- How does Datadog's APM connect traces to the underlying infrastructure metrics?
- What is cardinality and why does it matter when choosing what to put in a metric tag?