As asked
Tell me about a significant BGP or routing incident you were responsible for resolving. Walk me through how you were alerted, what your initial diagnosis looked like, the steps you took to restore service, and what you changed afterward to prevent recurrence.
Sample answer outline
Strong answer follows STAR: describes the alert (monitoring, customer report), initial triage (show bgp summary, looking at withdrawn prefixes, route table changes), isolation of root cause (misconfigured route filter, flapping peer, hardware failure), remediation steps taken under pressure, and post-incident actions (route filtering hardening, BFD tuning, runbook update). Candidate should show ownership, clear communication with stakeholders during the outage, and genuine learning from the incident.
Expect these follow-ups
- How did you communicate the situation to non-technical stakeholders during the incident?
- What monitoring gap allowed the issue to persist as long as it did before you caught it?