As asked
A new Deployment rollout is stuck at 3 out of 10 pods Ready. Walk me through how you would debug it in production.
Sample answer outline
Start with kubectl rollout status and kubectl describe deployment. Look at the events: ImagePullBackOff, CrashLoopBackOff, FailedScheduling all point in different directions. Check pod events and pod logs for the failing replicas. Common causes: readiness probe failing because the app needs longer to start, resource requests too high for the available nodes, a config map or secret reference that does not exist, or a PodDisruptionBudget blocking eviction of old pods. Resist the urge to delete pods until you know the cause.
Reference implementation (bash)
# Triage sequence
kubectl rollout status deployment/api -n prod
kubectl describe deployment/api -n prod
kubectl get pods -n prod -l app=api
kubectl describe pod <pending-pod> -n prod
kubectl logs <crashlooping-pod> -n prod --previous
kubectl get events -n prod --sort-by=.lastTimestamp | tail -20Expect these follow-ups
- What if the readiness probe passes but the service is still returning 503s?
- How do you set sane defaults for readiness and liveness probes?
- When would you use a startup probe?