Question 1

Tell me about a significant BGP or routing incident you were responsible for resolving. Walk me through how you were alerted, what your initial diagnosis looked like, the steps you took to restore service, and what you changed afterward to prevent recurrence.

Accepted Answer

Strong answer follows STAR: describes the alert (monitoring, customer report), initial triage (show bgp summary, looking at withdrawn prefixes, route table changes), isolation of root cause (misconfigured route filter, flapping peer, hardware failure), remediation steps taken under pressure, and post-incident actions (route filtering hardening, BFD tuning, runbook update). Candidate should show ownership, clear communication with stakeholders during the outage, and genuine learning from the incident.

Question 2

A monitoring alert fires: traffic to 203.0.113.0/24 is being dropped. You check the routing table on the edge router and the route is present with a valid next-hop. What are the next steps in your diagnostic process?

Accepted Answer

The route existing does not mean the path is working. Check: (1) is the next-hop reachable? (ping next-hop, check ARP/ND entry), (2) is there an ACL or firewall rule on the interface dropping traffic in the data plane? (show ip access-lists on the outbound interface), (3) is the FIB (hardware forwarding table) consistent with the RIB? (show ip cef or show forwarding table on the platform), (4) run traceroute and see where it drops, (5) check interface error counters for drops/errors in hardware. The route in RIB not making it to FIB is a classic platform bug.

Question 3

You pushed a firewall ACL change during a maintenance window and 2 minutes later production alerts start firing. The change window is still open. What is your decision process for whether to roll back immediately versus continue diagnosing?

Accepted Answer

If the change correlates temporally with the incident and the impact is significant, rollback first, diagnose second. The burden of proof is on the change: even if you do not yet understand why the rule is causing the problem, revert it to restore service. Exception: if rollback itself is risky (e.g., it would drop a VPN that other teams depend on). Rollback procedure: keep a pre-change config snapshot, use a one-command revert. Post-rollback: confirm alerts clear (wait 2 to 3 minutes). Then diagnose the issue in a staging environment. Document everything in the change ticket.

Question 4

Describe a situation where you identified that the network was approaching a capacity limit before it became a user-impacting problem. What data led you to that conclusion, how did you build the case to get resources approved, and what did you implement?

Accepted Answer

Strong answer shows proactive monitoring: trend analysis on interface utilization (approaching 70% sustained triggers a capacity review in most orgs), traffic growth projections, identification of bottleneck. Building the business case: translate technical capacity metrics to cost-of-outage risk and revenue impact. Implementation: phased upgrade, new link provisioning, or traffic engineering to rebalance load. Candidate should show quantitative thinking and ability to influence procurement decisions.

Question 5

Tell me about a time when a development team or business unit pushed back strongly on a firewall or security policy you needed to enforce. How did you handle the disagreement, and what was the outcome?

Accepted Answer

Strong answer shows empathy for the business need while maintaining security posture: Candidate listened to what the team actually needed to accomplish (not just what they asked for), proposed a lower-risk alternative (e.g., a separate segment, application-layer proxy instead of opening a broad rule), documented the risk if the exception was granted, and escalated appropriately when needed. Shows the ability to say no constructively rather than becoming a blocker.

Question 6

Tell me about a manual network task that you automated. What was the process before automation, what did you build, how did you test it safely, and what was the impact?

Accepted Answer

Strong answers describe a specific painful repetitive task (VLAN provisioning, ACL updates, device onboarding), the tool or script built (Python + Netmiko, Ansible, Terraform for cloud), testing approach (dry-run mode, staging lab, diff review before commit), and measurable outcome (time reduction, error rate drop, number of changes automated per week). Candidate should acknowledge what went wrong in early iterations and how they iterated.

Question 7

Walk me through the most technically complex network problem you have debugged while on call. What made it hard, what tools and techniques you used to isolate it, and how long it took from first alert to resolution?

Accepted Answer

Strong answers describe a genuinely complex multi-layer issue (e.g., intermittent packet loss only under specific traffic patterns, an interaction between QoS misconfiguration and a hardware buffer behavior, or asymmetric routing exposed only under load). Candidate shows structured debugging methodology: divide and conquer, binary search of the path, packet captures at multiple points, correlation with change window. Key is showing how they stayed calm, looped in the right people at the right time, and updated stakeholders throughout.

Question 8

Tell me about a large network migration you led that required coordinating with multiple teams. What was the migration (MPLS to SD-WAN, IPv4 to IPv6, data center move, etc.), how did you plan for minimal downtime, and what went wrong that you had to adapt to?

Accepted Answer

Strong answers cover planning (dependency mapping, rollback plan, change windows, communication plan with application teams), execution (phased cutover, traffic monitoring during migration, go/no-go criteria), and an honest account of what went wrong and how they adapted. Candidate should show ability to keep stakeholders informed and make real-time decisions under pressure during a cutover window.

Question 9

Tell me about a time you helped a junior engineer grow technically. What was their gap, what approach did you take, and how did you know they had improved?

Accepted Answer

Strong answers describe a specific skill gap (understanding of BGP, inability to read packet captures, poor troubleshooting methodology) and a concrete approach (pair debugging sessions, directing them to relevant RFCs, shadowing on-call rotations, code review of their automation scripts). Candidate shows patience, ability to explain concepts at different levels, and a genuine interest in others' growth. Measurable outcome: the junior engineer handled a specific class of incidents independently.

Question 10

Tell me about a security incident where network-level evidence or controls were central to detection or containment. What were the indicators, what network actions did you take, and how did you coordinate with security teams?

Accepted Answer

Strong answers describe concrete network indicators (unusual NetFlow patterns, unexpected DNS queries, east-west scanning detected by firewall logs, BGP hijack detected by prefix monitoring) and network-level response actions (null-routing source IPs, isolating a VLAN, blocking at the egress firewall, coordinating with upstream ISP for BGP blackhole). Candidate shows ability to act quickly within their authority and escalate appropriately, and understands the difference between containment and remediation.

Question 11

Walk me through a network change that required coordination across more than three teams (e.g., security, application, storage, server, change management). How did you manage dependencies and communication, and what would you do differently?

Accepted Answer

Strong answers show structured coordination: identifying all stakeholders early, creating a shared change plan with clear dependencies and owners, running a dry-run or pre-check call, defining explicit go/no-go criteria, and having a rollback plan all teams understood. Candidate should be honest about what went wrong (a team was not notified, a dependency was missed) and what they would do differently. Shows that technical excellence alone is not enough for large changes.

Question 12

A server team reports intermittent packet loss on a new server they just installed. The port shows 1Gbps connected. You see CRC errors and late collisions incrementing on the switch port. What is your diagnosis and fix?

Accepted Answer

Late collisions and CRC errors together strongly indicate a duplex mismatch: one side is full-duplex, the other is half-duplex. One side does not defer when the other is transmitting, causing collisions after the preamble (late collisions). Diagnosis confirmed with 'show interface' showing half-duplex on the switch port or NIC settings. Fix: set both sides to explicit speed/duplex (hardcode 1000/full on both switch and NIC) rather than using auto-negotiation, since auto-neg failures are a common source of this issue. Verify error counters clear after the fix.

Question 13

A newly provisioned BGP session with a transit ISP is stuck in Active state after 30 minutes. The ISP confirms their side is configured. Walk through your step-by-step diagnosis starting from the most likely cause.

Accepted Answer

Active state means TCP SYN is being sent but no SYN-ACK. Diagnosis order: (1) confirm BGP neighbor IP and local source IP are correct, (2) verify the TCP session is reaching the ISP (tcpdump on the peering interface looking for SYN/SYN-ACK), (3) check local ACL or firewall blocking TCP 179 inbound or outbound, (4) verify the correct source address is being used (update-source config), (5) check TTL security if GTSM is configured (TTL must be 255 for directly connected eBGP), (6) confirm AS numbers on both sides match the configuration. Most common cause in new sessions is ACL or firewall blocking port 179.

Question 14

Your monitoring shows that latency from your data center to a cloud provider's region jumped from 5ms to 80ms 20 minutes ago. No other regions are affected. Walk through your systematic investigation.

Accepted Answer

Check if a BGP route change caused traffic to take a longer path: run traceroute, compare hop-by-hop path to the baseline. Check 'show bgp' for the affected prefix to see if next-hop or AS_PATH changed. The 80ms jump suggests a path from US to Europe or US West to East routing. Check the ISP looking glass for your prefix to see how it looks from their side. Look for a link failure that caused re-routing via a longer path. Check cloud provider status page. If this is an MPLS TE path, check if the LSP rerouted to a secondary path with higher latency. Engage transit ISP if the problem is in their network.

Question 15

Users intermittently cannot complete TCP connections to a service in your data center. You suspect asymmetric routing is causing stateful firewall drops on return traffic. How do you confirm asymmetric routing is the cause and what are your options to fix it?

Accepted Answer

Confirm with simultaneous packet captures on ingress and egress interfaces: if SYN arrives on interface A but SYN-ACK leaves on interface B, the firewall (if inline on one path) will drop the SYN-ACK as having no matching session. Traceroute from both sides to identify where paths diverge. Fixes: (1) make routing symmetric by adjusting metrics or ECMP pinning, (2) enable stateful firewall synchronization across HA pair if both paths traverse different firewall instances, (3) move to stateless packet filtering for this specific flow, (4) use SNAT to ensure all traffic in/out through the same device. Root cause is usually unequal routing metrics or ECMP without session awareness.

Question 16

Your monitoring shows a WAN interface going up and down every 3 to 4 minutes. The circuit has been stable for two years. What is your systematic approach from Layer 1 upward, and how do you engage the carrier?

Accepted Answer

Layer 1 first: check interface physical error counters (CRC, input errors, framing errors indicate signal quality issues), check optical levels on SFP (Rx power too low or too high), check cable for visible damage or bend radius violations. Carrier engagement: open a ticket with the circuit ID and event timestamps; ask them to check signal levels and alarms on their side. If optical: check for dirty connectors (clean both ends with proper tools), check for bend radius violations, swap SFP to rule out hardware. Layer 2: check for LCP negotiation issues (if PPP), check keepalive timers (if both sides are not aligned, keepalive mismatch causes flap). Log the exact down/up times to correlate with weather, power events, or traffic patterns.

Question 17

Tell me about a situation where a network vendor recommended a specific solution or product, and you disagreed with their recommendation. How did you evaluate it, how did you make your case, and what happened?

Accepted Answer

Strong answers show technical independence: candidate evaluated the recommendation against actual requirements (TCO, vendor lock-in, feature gaps, support quality), did their own testing or benchmarking, and presented data-backed alternatives. Shows ability to maintain professional vendor relationships while not being technically pushed around. Candidate should also acknowledge when the vendor was ultimately right and what they learned.

Question 18

Many network teams have poor documentation. Tell me about a situation where you identified a documentation gap that caused real problems and what you did about it.

Accepted Answer

Strong answers tie the documentation gap to a concrete impact: a new engineer built the wrong thing because the IP plan was out of date, an incident took longer because the runbook was missing, a vendor couldn't do maintenance because the topology diagram was wrong. Candidate describes what they built (wiki runbooks, automated diagram generation via network discovery, IP address management tool like NetBox) and crucially how they got the team to maintain it (integration with change management, lightweight update processes).

Question 19

Most networks accumulate technical debt over time: end-of-life hardware, inconsistent configs, undocumented workarounds. Tell me about a situation where you tackled a meaningful piece of network technical debt. How did you justify the work, execute it, and measure success?

Accepted Answer

Strong answers tie the debt to a concrete operational cost: increased incident rate from old gear, engineers spending hours on workarounds, vulnerability exposure from EOL software. Candidate describes the business case (risk framing, not just engineering preference), phased execution plan, and measurable outcomes (reduced MTTR, fewer incidents, config standardization score). Shows ability to get organizational buy-in for improvement work that does not add new features.

Question 20

Users report they are being directed to a fake login page when visiting your company's banking partner website. DNS is returning an IP that does not belong to the bank. What steps do you take immediately and over the next 24 hours?

Accepted Answer

Immediate: flush the resolver cache and re-query the authoritative; check if DNSSEC is enabled and if the forged record fails DNSSEC validation (it should); isolate whether the bad response is coming from your resolver, an upstream forwarder, or from DNS cache poisoning. If DNSSEC catches it, resolver should not have served it. If DNSSEC is not enabled, this is a Kaminsky-style attack vector. Block the malicious IP at the firewall. Alert security team and incident response. 24 hours: verify DNSSEC is enabled on the resolver and validating, consider switching to DNS over TLS/HTTPS upstream, audit resolver ACLs to ensure it is not open to the internet.

Questions

Tell me about a BGP or routing incident you owned end-to-endBehaviouralmediumVery common

As asked

Sample answer outline

Expect these follow-ups

What do you do when a route is in the table but traffic is being droppedBehaviouralmediumVery common

As asked

Sample answer outline

Expect these follow-ups

A firewall rule change broke production. How do you respondBehaviouralmediumVery common

As asked

Sample answer outline

Expect these follow-ups

Describe a capacity planning project you droveBehaviouralmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Tell me about a time you resolved a conflict over a firewall policy changeBehaviouralmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Tell me about a network automation initiative you built or ledBehaviouralmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Describe the most complex on-call network issue you have debuggedBehaviouralhardCommon

As asked

Sample answer outline

Expect these follow-ups

Tell me about leading a complex network migrationBehaviouralmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Describe a time you mentored a junior network engineerBehaviouraleasyCommon

As asked

Sample answer outline

Expect these follow-ups

Tell me about a network security incident you helped containBehaviouralmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Describe coordinating a network change that required many teamsBehaviouralmediumCommon

As asked

Sample answer outline

Expect these follow-ups

How do you diagnose a duplex mismatch causing intermittent errorsBehaviouraleasyCommon

As asked

Sample answer outline

Expect these follow-ups

A BGP peer is stuck in Active state. Walk through your diagnosisBehaviouralmediumCommon

As asked

Sample answer outline

Expect these follow-ups

Latency to a cloud region jumped from 5ms to 80ms. What do you doBehaviouralmediumCommon

As asked

Sample answer outline

Expect these follow-ups

How do you detect and fix asymmetric routing causing stateful firewall dropsBehaviouralhardCommon

As asked

Sample answer outline

Expect these follow-ups

A WAN interface is flapping every few minutes. What do you check firstBehaviouraleasyCommon

As asked

Sample answer outline

Expect these follow-ups

Tell me about a time you pushed back on a vendor's recommendationBehaviouralmediumOccasional

As asked

Sample answer outline

Expect these follow-ups

Describe how you built or improved network documentation in a teamBehaviouraleasyOccasional

As asked

Sample answer outline

Expect these follow-ups

Tell me about reducing significant network technical debtBehaviouralmediumOccasional

As asked

Sample answer outline

Expect these follow-ups

Your DNS resolver is returning wrong IPs for a known-good domainBehaviouralhardOccasional

As asked

Sample answer outline