DevOps Engineering Interviews: Infrastructure & Automation Expertise 2026

A hiring manager at a mid-sized fintech company told me recently that she can tell within the first three questions whether a DevOps candidate has actually operated systems under pressure or just read the documentation. The giveaway is always the same: candidates who’ve run real systems talk about failure modes first. Candidates who’ve only studied talk about happy paths.

This list is organized by domain. I’ve included notes on expected answer depth where the question is commonly answered too shallowly.

Git and version control (questions 1-9)

  1. What is the difference between git merge and git rebase? When would you use each?
  2. Explain Git Flow versus trunk-based development. What are the operational trade-offs?
  3. How do you handle a hotfix when main is several commits ahead of production?
  4. What does a meaningful commit message look like, and why does it matter for DevOps specifically?
  5. How would you find which commit introduced a regression across 200 commits?
  6. Explain git cherry-pick and when it becomes a problem in a team environment.
  7. What is git reflog and when have you needed it?
  8. How do you enforce branch protection policies at the org level in GitHub?
  9. What is the GitOps model and how does it differ from traditional pipeline-triggered deployments?

Note on depth: Question 9 is frequently answered at a surface level (“GitOps means Git is the source of truth”). Strong answers explain the control loop: an agent watches the repo, compares desired state to actual cluster state, and reconciles. Tools like Flux or ArgoCD implement this. That’s the answer interviewers are listening for.

Jenkins and CI/CD (questions 10-19)

  1. How do you structure a multi-stage Jenkins pipeline? What goes in each stage?
  2. How does Jenkins handle build agents? What’s the difference between static and dynamic agents?
  3. How do you manage secrets in a Jenkins pipeline without hardcoding them?
  4. What is a Jenkinsfile and why does storing it in the repo matter?
  5. How would you set up a pipeline that deploys to staging automatically but requires manual approval for production?
  6. What does “pipeline as code” mean in practice, and what problems does it solve?
  7. How do you handle flaky tests in CI without just retrying them blindly?
  8. What metrics do you track to assess the health of a CI system?
  9. How do you roll back a failed deployment in a Jenkins-managed pipeline?
  10. What’s the difference between blue-green and canary deployments, and which would you use for a stateful service?

Docker and containerization (questions 20-29)

  1. What is the difference between a Docker image and a Docker container?
  2. How does layer caching work in Docker builds, and how do you optimize for it?
  3. What is a multi-stage build and why does it reduce image size?
  4. How do you run a container as a non-root user, and why should you?
  5. What is the difference between CMD and ENTRYPOINT in a Dockerfile?
  6. How do you scan a Docker image for vulnerabilities before pushing to a registry?
  7. What is Docker Compose used for, and when is it not the right tool?
  8. How do you handle container logging at scale?
  9. What’s the difference between bind mounts and volumes in Docker?
  10. How do you reduce the attack surface of a container image?

Kubernetes (questions 30-41)

  1. Explain the difference between a Deployment and a StatefulSet. Give a concrete use case for each.
  2. A pod is stuck in CrashLoopBackOff. Walk me through your diagnostic process.
  3. What is a Kubernetes Service? Explain the difference between ClusterIP, NodePort, and LoadBalancer.
  4. How does Kubernetes RBAC work? What’s the difference between a Role and a ClusterRole?
  5. What is a PersistentVolumeClaim and how does dynamic provisioning work?
  6. How does the Horizontal Pod Autoscaler work, and what metrics does it support natively?
  7. How do you do a zero-downtime rolling update in Kubernetes?
  8. What is a liveness probe vs. a readiness probe? How do you configure each?
  9. How do you manage configuration across environments in Kubernetes (ConfigMaps, Secrets, external tools)?
  10. What is a NetworkPolicy and how do you use it to isolate namespaces?
  11. How does Kubernetes handle node failures?
  12. Explain the Kubernetes scheduler. How does it decide where to place a pod?

Note on depth: Question 31, the CrashLoopBackOff question, is a genuine signal. The full answer involves: checking events with kubectl describe pod, reading logs with kubectl logs --previous, checking resource limits, verifying the image tag, and checking init containers. Candidates who jump straight to “check the logs” without the full sequence usually haven’t debugged production K8s.

Terraform and infrastructure as code (questions 42-50)

  1. What is Terraform state and why does it need to be stored remotely in team environments?
  2. What is state locking and how does it prevent race conditions?
  3. How do you detect and remediate infrastructure drift?
  4. What is the difference between terraform plan and terraform apply? What do you check in the plan output?
  5. How do you structure Terraform modules for reusability?
  6. What are Terraform workspaces, and when are they the wrong solution for environment separation?
  7. How do you test Terraform code? What tools exist for this?
  8. What is the difference between Terraform and Pulumi? What factors would lead you to choose one over the other?
  9. How do you handle sensitive values (API keys, credentials) in Terraform configurations?

The Stack Overflow Developer Survey 2024 puts Terraform as the most widely used infrastructure-as-code tool among professional developers. Expect it in every DevOps loop at a company with more than a handful of engineers.

One last thing: the BLS occupational outlook for DevOps-adjacent roles projects continued demand through 2032. The technical bar isn’t going down. If you’re interviewing now, assume the person across from you has seen hundreds of candidates answer these exact questions, and they know when an answer is rehearsed versus when it came from shipping real systems.

What’s on your weak list?

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top