DBA Interview Questions 2026: 35 Production Database Scenarios & Architecture Decisions

A DBA candidate I know once walked into a senior role interview at a healthcare company, answered every normalization and indexing question correctly, then got eliminated because she couldn’t answer this: “Walk me through how you’d restore from a backup in the first 20 minutes of a production outage, with three people calling you.” The technical knowledge wasn’t the problem. The operational composure was.

That’s the pattern I’ve noticed across DBA interviews. The companies that do them well are testing both the knowledge and the judgment. The questions below cover both, across five main areas: SQL and query optimization, performance tuning, backup and recovery, security, and high availability.

SQL optimization and indexing

These questions tend to come early. They’re used as a baseline filter before interviewers go deeper.

Common ones:

  • You have a query that’s doing a full table scan on a 200-million-row table. How do you start diagnosing it?
  • What’s the difference between a clustered and non-clustered index, and when would you choose each?
  • Your application team added a new feature and now a formerly-fast query is timing out. You suspect it’s a join issue. Walk me through your investigation.
  • What is index cardinality, and why does it affect query performance?

On the full table scan question, the expected approach starts with EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL). You’re looking for sequential scans on large tables, high estimated row counts, and whether index conditions are being applied. The fix might be an index, a statistics refresh, or a query rewrite. Experienced interviewers want to hear you rule out causes systematically, not jump to “add an index.”

On clustered vs non-clustered: a clustered index determines the physical sort order of the table (there can only be one per table). Non-clustered indexes are separate structures that point back to the row. Use the clustered index on your most frequent range-scan column, often a date or auto-incrementing primary key. Everything else gets non-clustered if it gets indexed at all.

Performance tuning and monitoring

The questions here get more situational and judgment-heavy. Interviewers are watching whether you can isolate causes or whether you go straight to the most dramatic fix.

  • Database CPU has been spiking to 90% every night at 2 AM. Where do you look first?
  • What metrics do you track daily as part of routine monitoring?
  • A developer asks you to review a query before it ships. What are you looking for?
  • How do you identify lock contention, and what are the common causes?

The 2 AM spike question almost always traces back to a scheduled job, a batch process, or an autovacuum/autostats update kicking off. The answer they want: you check your scheduled jobs first, correlate with slow query logs from that window, look at wait stats, and rule out external cron jobs or ETL processes before you start tweaking server config.

For developer query review, you’re looking at: is this doing a full scan where it shouldn’t be, are there implicit type conversions that break index usage, are there functions on indexed columns in the WHERE clause, and is the result set being pulled entirely into the application when a server-side aggregation would do the same thing in less data transfer.

Backup and recovery: where judgment matters most

According to the BLS Occupational Outlook for Database Administrators, DBAs are responsible for “ensuring data remains available, which often requires creating backup copies and recovery plans.” That language undersells how consequential this area is in actual interviews.

The questions that come up:

  • Explain your backup strategy for a production database that can’t afford more than 1 hour of data loss.
  • What’s the difference between RTO and RPO, and how do you design a backup plan around both?
  • You’ve just been told there’s a production data corruption event from 3 hours ago. Walk me through your response.
  • How do you test that your backups actually work?

The RTO vs RPO distinction matters a lot here. RTO (Recovery Time Objective) is how long the business can tolerate the database being down. RPO (Recovery Point Objective) is how much data loss is acceptable. A company that processes payments can tolerate maybe 15 minutes of RPO. An internal analytics database might be fine with 24 hours. Your backup strategy has to be designed around those specific numbers, not a generic “we do nightly backups.”

On backup testing: saying “we run monthly restores to a staging environment and verify row counts” is a real answer. Saying “we assume backups work because the backup job hasn’t errored” is the answer that ends careers. Interviewers who’ve dealt with actual disasters will push hard on this.

Security and access control

Security questions in DBA interviews have gotten harder over the past few years. Data breach costs have pushed companies to treat database security as a first-class concern rather than an afterthought.

  • How do you implement least-privilege access for a new application connecting to your database?
  • A developer asks for direct SELECT access to the production users table. How do you handle it?
  • What is SQL injection, and where does the DBA’s responsibility end and the developer’s begin?
  • How would you detect if someone is running unusual queries against production data?

On least privilege: the expected answer is to create a database user specific to the application with only the permissions it needs (usually INSERT, UPDATE, SELECT on specific tables, no DROP or ALTER, no access to system tables). Service accounts should have no interactive login capability.

On SQL injection: the DBA’s job is parameterized queries at the database driver level, stored procedures that validate inputs, and audit logging. The developer owns the application layer. Where they overlap is stored procedure design and connection pooling configuration. That nuance in your answer signals you’ve thought about real team dynamics, not just the textbook definition.

High availability, replication, and the hard questions at the end

The Stack Overflow Developer Survey 2024 shows PostgreSQL as the most popular database among professional developers, narrowly ahead of MySQL. High availability architecture questions tend to be database-specific, so it’s worth thinking about which stack you’re interviewing for.

Questions that come up at the end of onsite rounds:

  • Describe the difference between synchronous and asynchronous replication and when you’d use each.
  • Your primary database server fails at 3 PM on a Tuesday. Walk me through failover.
  • What does “split-brain” mean in a replication setup, and how do you prevent it?

On synchronous vs asynchronous replication: synchronous means the primary waits for the replica to confirm the write before acknowledging success to the application. Zero data loss, but added latency. Asynchronous means the primary acknowledges immediately and the replica catches up. Better performance, potential for data loss in a crash. For financial systems, synchronous. For read-replica setups supporting analytics, asynchronous is usually fine.

Split-brain happens when a network partition causes two nodes to both believe they’re the primary and accept writes independently. You prevent it with quorum-based consensus (at least 3 nodes so a majority is always required) or with a dedicated arbitrator. If you’re not familiar with Raft consensus or Paxos at a high level, that’s worth reading before a senior DBA interview.

The practical part of all of this: interviewers at good companies want to see how you’d behave in a real incident. The questions above aren’t pop quizzes. They’re trying to understand whether you’d be calm and systematic at 3 AM when something is on fire. The candidates who get hired are the ones who answer as if the incident is already real.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top