AWS Solutions Architect Interview 2026: 60+ Service Design Questions & Tradeoffs

I’ve sat through probably 40 AWS interviews over the past few years, both as a candidate and watching others prep. The questions that trip people up aren’t usually the tricky ones. They’re the basics that candidates assume they know cold but haven’t thought about in months.

This guide covers the areas that actually come up. Not all of them will show up in a single loop, but if you can answer everything here, you’re in reasonable shape for most cloud engineering or solutions architect roles.

What AWS interviews actually test

Most AWS interviews aren’t trying to trick you. They’re checking whether you’ve actually used these services or just read about them. The difference shows fast.

Interviewers at AWS-heavy shops generally care about four things: your reasoning about trade-offs between services, whether you understand cost implications, how you handle failure scenarios, and whether you’ve ever debugged something at scale. If your answers are purely theoretical, that gap becomes obvious.

According to the Stack Overflow Developer Survey 2024, AWS remains the most-used cloud platform among professional developers at 48%, well ahead of Azure at 26% and GCP at 23%. That means the job market for AWS expertise is wide, but so is the competition.

Compute and EC2 questions you should know cold

These come up in almost every loop. Treat them as table stakes.

  • What’s the difference between a stopped EC2 instance and a terminated one? Stopped keeps the EBS volume and lets you restart. Terminated destroys the root volume by default. Candidates mix these up constantly.
  • When would you choose a Spot instance over On-Demand? Spot makes sense for interruptible workloads: batch processing, CI/CD runners, ML training jobs. Never for stateful workloads that can’t tolerate a 2-minute termination notice.
  • What’s the difference between vertical and horizontal scaling on EC2? Vertical means resizing the instance (requires a stop). Horizontal means adding more instances behind a load balancer. The follow-up is usually: “Which would you reach for first and why?”
  • What are placement groups, and when do you use them? Three types: cluster (low latency, same AZ), spread (failure isolation, different hardware), partition (distributed systems like Cassandra). Most people only remember cluster.

One I’ve seen catch people: “If an EC2 instance runs out of memory, what happens?” The answer is the OS starts killing processes (OOM killer on Linux), which most candidates frame as an AWS problem. It isn’t. The interviewer is checking if you understand the boundary between EC2 and the OS.

Storage, S3, and databases

S3 questions are usually about consistency, lifecycle policies, and cost. RDS questions are about replication and failover. Both topics have a few traps.

On S3: since late 2020, S3 has offered strong read-after-write consistency for all objects. Before that, overwrite PUTs had eventual consistency. I still see candidates describe the old behavior as current. Interviewers notice.

Common S3 questions:

  • What’s the difference between S3 Standard, Intelligent-Tiering, and Glacier? The answer should include use cases and rough cost trade-offs, not just definitions.
  • How do you enforce server-side encryption on all objects in a bucket? Bucket policy that denies any PutObject without x-amz-server-side-encryption in the header.
  • What happens if two clients write to the same S3 key simultaneously? Last write wins. There’s no conflict detection. This matters for any application treating S3 as a shared mutable store.

On databases: Aurora vs RDS is a common pivot question. Aurora offers up to 15 read replicas (vs 5 for standard RDS), sub-10ms replica lag, and automatic storage scaling. The trade-off is cost and Aurora-specific behavior in failover scenarios. If you’ve actually run Aurora in production, say so and describe what you observed.

Networking and VPC

This is where a lot of candidates struggle, especially if they’ve only worked in environments where someone else handled the networking.

Questions you’ll almost certainly face:

  • What’s the difference between a public subnet and a private subnet? A public subnet has a route to an Internet Gateway. A private subnet doesn’t. Instances in a private subnet need a NAT Gateway to reach the internet.
  • When would you use VPC peering vs. Transit Gateway? Peering works for point-to-point. Transit Gateway scales to hundreds of VPCs and supports transitive routing. Cost and operational complexity differ significantly.
  • What’s a security group vs. a network ACL? Security groups are stateful and attached to instances. NACLs are stateless and attached to subnets. The stateful distinction matters: with SGs, return traffic is automatically allowed.
  • What is a VPC endpoint and why would you use one? It lets resources in a private subnet communicate with AWS services like S3 or DynamoDB without traffic leaving the AWS network. Useful for security and avoiding NAT Gateway data transfer costs.

I’ll be honest: I didn’t fully understand NACL evaluation order (rules evaluated in ascending number order, first match wins) until I had to debug a misconfigured one at 11pm. There’s no shame in saying you’ve learned most of this from incidents.

IAM and security

IAM questions are mostly about least privilege and trust policies. Interviewers want to know if you default to overly permissive setups or actually think about scope.

  • What’s the difference between an IAM role and an IAM user? Users have long-term credentials. Roles are assumed temporarily and issue short-term credentials via STS. EC2 instances, Lambda functions, and cross-account access all use roles.
  • What’s a resource-based policy vs. an identity-based policy? Identity-based attaches to an IAM entity. Resource-based attaches to the resource itself (S3 bucket policy, KMS key policy). Both can grant or deny access, and they’re evaluated together.
  • What is the permission boundary? A limit you can set on an IAM entity that defines the maximum permissions it can have, regardless of what its policies grant. Used mainly for delegation scenarios where you want to give someone the ability to create IAM roles without letting them escalate their own permissions.

The AWS IAM policy evaluation logic documentation is worth reading once in full before an interview. The order of evaluation (explicit deny always wins, then organization SCPs, then permission boundaries, then identity policies, then resource policies) is a surprisingly common interview topic at senior levels.

System design questions with an AWS flavor

These usually start with “design a system that does X” and expect you to reason about AWS services as building blocks. A few that come up regularly:

  • Design a URL shortener. They want to hear about DynamoDB or RDS for the key-value store, CloudFront for caching, and Lambda or EC2 for the redirect service. Cost matters: mention it.
  • Design a real-time notifications system. SQS vs. SNS trade-offs, fan-out patterns, WebSockets via API Gateway, Lambda for processing. This is a test of whether you understand push vs. pull.
  • How would you architect a multi-region application? Route 53 health checks, active-active vs. active-passive failover, data replication lag between regions, and the cost of running duplicate infrastructure. Most candidates forget to address cost.

One opinion I’ll stand behind even though some people disagree: the STAR method (or any rigid framework) is worse than just explaining your actual reasoning. Interviewers who evaluate AWS design questions care about whether your thought process tracks, not whether you hit predetermined talking points.

If you want to practice articulating your reasoning under pressure, Craqly’s AI interview copilot can run you through architecture prompts and give you real-time feedback without the awkward lag of asking a friend to quiz you.

A few questions people consistently miss

These come up less often but tend to separate candidates who’ve read the docs from candidates who’ve actually shipped things:

  • What happens to in-flight SQS messages if your Lambda function times out? They return to the queue after the visibility timeout expires and will be processed again. If you have side effects, you need idempotent handling.
  • What’s the default limit on Lambda concurrent executions per region? 1,000. It’s a soft limit and can be raised, but new accounts hitting this cap in production is a real thing that happens.
  • What’s the maximum size of a single SQS message? 256 KB. For larger payloads, the pattern is to put the payload in S3 and pass a reference in the message.
  • How does CloudFront cache invalidation work, and how much does it cost? The first 1,000 invalidation paths per month are free, then $0.005 per path. Wildcard invalidations count as one path. Most candidates have no idea there’s a cost.

47 AWS services are mentioned by name in the AWS Certified Solutions Architect Associate exam guide. Nobody knows all of them deeply. What interviewers are really testing is whether you know the ones your team would actually use and whether you’re honest about the gaps.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top