The gap between passing the CKA exam and being ready for a senior Kubernetes administrator interview is wider than most people expect. The exam tests whether you can execute known tasks under time pressure. The interview tests whether you understand why things work the way they do, and more importantly, whether you’ve diagnosed real failures in production clusters.
These questions come from patterns I’ve seen across technical interviews for platform engineering and DevOps roles. I can’t guarantee any specific company uses them, but the competency areas they cover are fairly standard.
Cluster architecture questions
“Walk me through what happens when a pod is scheduled.”
This is a classic opener and it has real depth. A shallow answer covers the scheduler and kubelet. A deeper answer includes the API server admission chain, how the scheduler watches for unbound pods via informers, what the kubelet does when it gets the binding, how the container runtime (containerd or CRI-O) actually creates the container, and what the pod status transitions look like in etcd along the way. Most candidates stop before the kubelet. The ones who describe the full flow have usually debugged a scheduling issue that taught them this.
“How do you approach etcd backup and recovery?”
If etcd goes down in your cluster, you have a serious problem. Interviewers want to hear that you have a real backup strategy, that you know etcdctl snapshot save is not a replacement for a proper backup infrastructure, and that you’ve thought about what a restore actually looks like under pressure. The candidates who’ve done a restore drill answer this differently from those who’ve only read about it.
“What causes a control plane node to stop being healthy and how do you debug it?”
This is where real operational experience shows. The answer involves checking kube-apiserver, controller-manager, and scheduler logs, verifying etcd health, checking certificate expiration (a surprisingly common cause of mysterious failures), and knowing which systemd units to inspect on a kubeadm cluster versus a managed control plane.
Networking questions
Networking is where Kubernetes interviews often separate strong candidates from exceptional ones, because the mental model required is non-trivial and the failure modes are subtle.
“Explain how a request from outside the cluster reaches a pod.”
Full answer involves: external load balancer or NodePort, kube-proxy rules (iptables or IPVS), how Services abstract pod IPs, how kube-dns resolves service names, and finally how the packet reaches the actual container network interface. A weak answer says “through an Ingress” and stops. The question is specifically about the network path, not just the Kubernetes resource model.
“You have a pod that can’t reach another pod in a different namespace. Where do you start?”
This is a diagnostic question with several reasonable entry points. Check NetworkPolicy objects, verify DNS resolution from inside the pod, check whether the CNI plugin is functioning correctly, look at pod-level firewall rules. The sequence matters. Good candidates have a mental triage order; weaker ones list things without prioritization.
“What’s the difference between a ClusterIP, NodePort, LoadBalancer, and ExternalName service? When would you actually use each?”
This gets asked in almost every K8s interview I’ve seen or heard about. The “when would you use each” part is where answers diverge. Most candidates can describe what they are; fewer can give a concrete scenario for ExternalName or explain when NodePort is appropriate versus LoadBalancer in a cost-sensitive environment.
RBAC and security questions
“How do you design RBAC for a multi-team cluster?”
Namespace-scoped roles versus cluster-scoped roles, service account design, how you audit what permissions exist, and how you handle role escalation risks. The interesting version of this question involves a scenario where a development team wants cluster-admin for “just their namespace,” which of course isn’t how ClusterRole works. Candidates who’ve navigated this with real users give better answers than those who’ve only designed RBAC from scratch in isolation.
“What Pod Security Standards policies do you apply by default and why?”
PodSecurityPolicy was deprecated in 1.21 and removed in 1.25. If a candidate starts describing PSP as their current approach without acknowledging the transition to PSA (Pod Security Admission), that’s a signal they haven’t kept up with the ecosystem. The question also probes whether you apply restricted, baseline, or privileged profiles by namespace, and what your reasoning is.
Troubleshooting questions
Troubleshooting questions are where experienced K8s admins separate themselves most clearly. These questions don’t have single right answers; they have reasoning patterns that interviewers are listening for.
“A deployment is stuck in rollout. Walk me through how you diagnose it.”
kubectl rollout status, kubectl describe deployment, checking pod events, looking at replicaset status, examining resource quotas and limits, checking image pull errors. The sequence and the specific commands matter. Candidates who know what information each command surfaces, and what they’re looking for in the output, pass this one. Those who describe the process abstractly often don’t.
“Nodes in your cluster are intermittently NotReady. How do you approach this?”
This one has many possible causes: kubelet crashes, network partition, resource pressure triggering evictions, kernel bugs, cloud provider API issues if you’re on managed nodes. Good candidates describe a triage path rather than jumping to a single hypothesis. The Kubernetes GitHub issues tracker is a genuinely useful resource here, because intermittent NotReady issues often have known causes documented in issues.
What interviewers are actually measuring
The Stack Overflow 2024 Developer Survey found Kubernetes among the top 10 most-used tools in professional development environments, with usage concentrated in organizations running more than 47 microservices in production. That context matters: the interview bar reflects real operational complexity.
What experienced interviewers are listening for is a combination of conceptual accuracy and operational scar tissue. You can get the first from documentation. The second comes from running clusters and debugging real failures. If you haven’t done that yet, the most honest preparation is to say clearly “I haven’t seen this failure mode in production, but my approach would be…” rather than improvising a scenario you haven’t actually lived.
Craqly’s live interview assistance can surface relevant Kubernetes context (YAML snippets, kubectl command sequences, architecture diagrams) in real time during a technical screen, which can be useful when an interviewer goes deep on a specific subsystem you haven’t touched recently. Worth knowing it exists if you’re prepping for a K8s platform role.
Which part of the Kubernetes interview are you finding hardest to prepare for?