As asked
Your team runs mixed workloads on an OCI Kubernetes Engine cluster: low-latency API pods that must never be co-located with batch jobs, and batch pods that should prefer spot nodes. Walk me through how you would configure Kubernetes scheduling to enforce these constraints, and what happens when the scheduler cannot satisfy them.
Sample answer outline
A strong answer covers taints and tolerations to repel batch pods from API nodes, node affinity and anti-affinity rules, PodAntiAffinity to spread API replicas across failure domains, Priority Classes and PodDisruptionBudgets to protect latency-critical workloads from eviction, and using OCI node pool labels to route batch jobs to spot-instance pools. The candidate should explain what the scheduler does when a pod cannot be placed (Pending state and events), and how to debug with kubectl describe pod.
Expect these follow-ups
- How do you ensure batch pods do not starve on a cluster where spot nodes are frequently reclaimed?
- What are the risks of using required versus preferred affinity rules and how do you decide which to use?