As asked
Describe a production incident where a cloud infrastructure failure caused customer impact. What was your role, what did you do, and what did you change afterward?
Sample answer outline
Strong answers follow STAR: the situation with specific blast radius (how many customers, how long), the actions taken during the incident (detection, mitigation, communication), the root cause found in the postmortem, and concrete process or architecture changes that reduced recurrence probability. Interviewers look for ownership, technical accuracy, and learning orientation rather than blame.
Expect these follow-ups
- What would you do differently if you faced the same incident today?
- How did you communicate the incident to non-technical stakeholders during the outage?