AWS Interview Questions and Answers in 2025

AWS Interview Questions

In this guide, we’ll cover the essential AWS fundamentals that every candidate is asked in an interview. These are the building blocks of AWS and form the core knowledge needed to excel in interviews and real-world scenarios. From understanding the key services like EC2, IAM, and S3, to grasping the nuances of AWS Regions, Availability Zones, and Edge Locations, this is where your cloud journey begins.

We’ll also touch on the shared responsibility model and how to manage AWS resources securely and efficiently—key concepts that will empower you to navigate the cloud with confidence.

Section 1: The Fundamentals – Core AWS Knowledge

These are the basics. If you don’t know these cold, don’t bother showing up.

1. What is AWS?

It’s Amazon’s cloud computing platform. You rent their computers, storage, and a ton of other services instead of buying and managing your own data centers. It’s Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) all rolled into one.

Academy Pro

AWS Cloud Course

Learn AWS Cloud fundamentals and gain hands-on experience in deploying, managing, and scaling applications on AWS.

8.58 Hrs
1 Project
Enrol Now for AWS Cloud Course

2. Explain the difference between a Region, Availability Zone (AZ), and Edge Location.

Region: A physical geographic location in the world, like us-east-1 (Northern Virginia) or eu-west-2 (London). Regions are isolated from each other.

Availability Zone (AZ): A data center within a Region. Each Region has multiple AZs (usually 3 or more) that are physically separate but connected with low-latency links. If one AZ goes down, the others in the Region should still be running. You build for high availability by deploying across multiple AZs.

Edge Location: These are points of presence used by AWS services like CloudFront (their CDN) and Route 53. They cache content closer to users to reduce latency. There are way more Edge Locations than Regions.

3. What is IAM? Why is it important?

IAM stands for Identity and Access Management. It’s how you control who can do what in your AWS account. It’s critically important for security. You use it to create users, groups, and roles, and attach policies (JSON documents) that define permissions (e.g., this user can only read from S3, this EC2 instance can write to a DynamoDB table). Never use your root account for daily work; always create IAM users with the least privilege necessary.

4. What are the main ways to access and manage AWS resources?

AWS Management Console: The web-based UI. Good for visibility and one-off tasks.

AWS Command Line Interface (CLI): A command-line tool for managing services. Essential for automation and scripting.

Software Development Kits (SDKs): For managing resources programmatically from your code (e.g., Python’s boto3).

5. What is shared responsibility in the context of AWS security?

AWS is responsible for the security of the cloud (the hardware, software, networking, and facilities that run AWS services). You are responsible for security in the cloud. This means you manage your data, who has access to it (IAM), operating system patches on EC2, firewall rules (Security Groups), and client/server-side encryption. The specifics vary by service. For EC2, you manage more; for S3, you manage less.


Section 2: Compute – EC2 & Lambda

This is where your applications run. Expect deep dives here.

6. What’s the difference between EC2 and Lambda? When would you use one over the other?

EC2 (Elastic Compute Cloud): A virtual server in the cloud. You provision it, you manage the OS, you install your software. It runs 24/7 until you stop it. Use EC2 for traditional applications, long-running processes, or when you need full control over the environment.

Lambda: A serverless compute service. You upload your code (a function), and AWS runs it in response to an event (like an API call or a file upload to S3). You don’t manage any servers. It scales automatically and you only pay for the compute time you consume, down to the millisecond. Use Lambda for event-driven applications, microservices, and tasks that are short-lived and stateless.

7. What is an AMI (Amazon Machine Image)?

An AMI is a template for an EC2 instance. It contains the operating system, an application server, and any required software. You launch instances from an AMI. You can use public AMIs provided by AWS or create your own custom AMIs to pre-configure your instances, which is faster and more consistent for deployments.

8. What are the different EC2 instance purchasing options?

On-Demand: Pay a fixed rate by the hour (or second) with no commitment. Good for unpredictable workloads.

Reserved Instances (RIs): You commit to using a specific instance type in a specific region for a 1- or 3-year term for a significant discount. Good for predictable, steady-state workloads.

Savings Plans: A more flexible pricing model. You commit to a certain amount of compute usage ($/hour) for a 1- or 3-year term. This automatically applies to EC2 and Fargate usage regardless of instance family or region. It’s generally preferred over RIs now.

Spot Instances: Bid on unused EC2 capacity for up to a 90% discount. The catch is AWS can terminate your instance with a two-minute warning if they need the capacity back. Ideal for fault-tolerant, stateless, or batch processing workloads.

9. What is Auto Scaling?

Auto Scaling automatically adjusts the number of EC2 instances in your deployment based on demand. You create an Auto Scaling Group (ASG) and define policies, like “add two instances when CPU utilization is over 70% for 5 minutes” and “remove one instance when it drops below 30%.” It’s key for building scalable and resilient applications.

10. What is the difference between vertical and horizontal scaling?

Vertical Scaling: Increasing the resources of a single instance (e.g., moving from a t2.micro to a t2.large). You can do this by stopping the instance, changing its type, and restarting it. There’s a limit to how much you can scale vertically.

Horizontal Scaling (Scaling Out): Adding more instances to your resource pool. This is what Auto Scaling does. It’s generally the preferred way to scale in the cloud for high availability and near-infinite scalability.

11. What is a “cold start” in Lambda?

When a Lambda function is invoked for the first time or after a period of inactivity, AWS has to provision a container and initialize your code. This initial setup causes a delay known as a cold start. For latency-sensitive applications, you can use Provisioned Concurrency to keep a certain number of function instances warm and ready to respond instantly, but it costs more.


Section 3: Storage – S3 & EBS

Data has to live somewhere. Know the difference between object and block storage.

12. What is S3 (Simple Storage Service)? What are its main features?

S3 is a highly durable and scalable object storage service. You store files (objects) in containers called buckets.

Features: Versioning, server-side encryption, lifecycle policies (e.g., move data to a cheaper storage class after 30 days), static website hosting. It’s not a file system for an OS.

13. What is the difference between S3 and EBS (Elastic Block Store)?

S3: Object storage. Accessed via API calls over the internet. It’s for storing files, backups, and static assets. It’s independent of any EC2 instance.

EBS: Block storage. It’s a virtual hard drive that you attach to a single EC2 instance in the same AZ. It provides the low-latency block-level storage needed to run an operating system and databases.

14. What are the different S3 storage classes?

S3 Standard: Default. For frequently accessed data. High availability and performance.

S3 Intelligent-Tiering: Automatically moves data between a frequent access tier and an infrequent access tier to save costs.

S3 Standard-Infrequent Access (S3 Standard-IA): For data that is accessed less frequently but requires rapid access when needed. Cheaper storage price, but you pay a retrieval fee.

S3 Glacier Instant Retrieval, Flexible Retrieval, Deep Archive: Progressively cheaper and slower options for long-term archival. Deep Archive is the cheapest, but retrieval can take hours.

15. How would you secure data in an S3 bucket?

Block Public Access: This is enabled by default and is the first line of defense.

IAM Policies & Bucket Policies: Use these to grant granular access to specific users, roles, or services.

Encryption: Enable server-side encryption (SSE-S3, SSE-KMS, or SSE-C) to encrypt data at rest. Enforce encryption in transit by using HTTPS.

Access Logs: Log all requests made to your bucket for auditing.

VPC Endpoints: Allow EC2 instances in a private subnet to access S3 without going over the public internet.


Section 4: Networking – VPC

Networking is complex but fundamental. A lot of interviews will expose weakness here.

16. What is a VPC (Virtual Private Cloud)?

A VPC is your own logically isolated section of the AWS cloud. You define your own private IP address range, create subnets, configure route tables, and set up network gateways. It’s the networking foundation for most of what you do in AWS.

Texas McCombs, UT Austin

Post Graduate Program in Cloud Computing by UT Austin

Master AWS, Azure & GCP in 6 months with 200+ projects & GenAI tools. Get career-ready with UT Austin’s Cloud Computing Certificate

5055 Ratings
4.58
Enroll Now

17. What’s the difference between a Security Group and a Network ACL (NACL)?

This is a classic.

Security Group: Acts as a stateful firewall for an EC2 instance. You only define allow rules. If you allow inbound traffic on a certain port, the outbound return traffic is automatically allowed, regardless of outbound rules.

NACL: Acts as a stateless firewall for a subnet. You define both allow and deny rules. Because it’s stateless, you must explicitly define rules for both inbound and outbound traffic. If you allow inbound traffic on a port, you must also create a corresponding outbound rule for the response traffic.

18. What is the difference between a public and a private subnet?

Public Subnet: Has a route table entry that directs traffic to an Internet Gateway (IGW). Instances in a public subnet can have a public IP address and can communicate directly with the internet.

Private Subnet: Does not have a route to an IGW. Instances here cannot be reached from the internet. To allow them to access the internet for things like software updates, you use a NAT Gateway.

19. What is a NAT Gateway?

A NAT (Network Address Translation) Gateway is a managed AWS service that allows instances in a private subnet to connect to the internet or other AWS services, but prevents the internet from initiating a connection with those instances. It’s used for outbound-only internet access.

20. What is VPC Peering? When would you use it?

VPC Peering is a direct network connection between two VPCs that enables you to route traffic between them using private IP addresses. It’s not transitive (if VPC A is peered with B, and B is peered with C, A cannot talk to C). For connecting many VPCs, a Transit Gateway is the better, more scalable solution.


Section 5: Databases

Know the difference between SQL and NoSQL and which AWS service fits which use case.

21. What is RDS (Relational Database Service)?

RDS is a managed service for relational databases like MySQL, PostgreSQL, Oracle, and SQL Server. AWS handles the administrative tasks like provisioning, patching, backup, and recovery. You’re still responsible for schema design and query optimization.

22. What is DynamoDB?

DynamoDB is a fully managed, serverless NoSQL key-value and document database. It delivers single-digit millisecond performance at any scale. It’s designed for applications that need consistent, fast performance as they grow.

23. Compare RDS and DynamoDB. When would you choose one over the other?

RDS (SQL): Use when your data is structured, you require complex queries, joins, and ACID transactions. Good for traditional applications like e-commerce sites or CRM systems.

DynamoDB (NoSQL): Use when you have unstructured or semi-structured data, need massive scale with low latency, and have simpler query patterns (usually looking up items by a key). Think IoT applications, user profiles, or gaming leaderboards.

24. What is ElastiCache?

ElastiCache is a managed in-memory caching service. It supports Redis and Memcached. You use it to put a cache in front of your database (like RDS or DynamoDB) or as a standalone in-memory data store to reduce latency and ease the load on your backend systems.


Section 6: Scenario-Based Questions

This is where they test if you can actually apply knowledge.

25. Scenario: Your AWS bill has suddenly doubled. How do you investigate?

AWS Cost Explorer: First, go here. Use it to visualize and analyze costs. Filter by service, region, and tags to pinpoint what changed.

AWS Budgets: Check if any budget alerts were triggered. If not, set them up immediately for the future.

CloudTrail: Look at CloudTrail logs to see what API calls were made. Did someone spin up a bunch of expensive instances?

Trusted Advisor: Check the cost optimization recommendations. It identifies things like idle EC2 instances or underutilized EBS volumes.

Common Culprits: Look for data transfer costs (often a hidden killer), unreleased Elastic IPs, or a misconfigured Auto Scaling group that scaled up and never scaled down.

26. Scenario: You need to design a highly available and scalable web application. Outline the architecture.

This is a classic system design question.

DNS: Use Route 53 for DNS routing.

CDN: Use CloudFront to cache static content at the edge and reduce latency.

Load Balancing: Use an Application Load Balancer (ALB) to distribute traffic across multiple EC2 instances.

Compute: Place the EC2 instances in an Auto Scaling Group that spans multiple Availability Zones.

Database: Use a multi-AZ RDS instance for the database. The multi-AZ setup provides a standby replica in a different AZ for automatic failover. For read-heavy workloads, add one or more Read Replicas.

State: For session state, use ElastiCache (Redis) so your web servers can be stateless.

Storage: Store user uploads and other static assets in S3.

27. Scenario: A developer complains they can’t connect to their EC2 instance. How do you troubleshoot?

Security Group: This is the #1 cause. Check the inbound rules on the instance’s Security Group. Is port 22 (for SSH) or 3389 (for RDP) open to their IP address?

NACL: Check the NACL associated with the instance’s subnet. Are there any deny rules blocking the traffic?

Route Table: Is the subnet a public subnet with a route to an Internet Gateway? If it’s a private subnet, are they trying to connect from the internet (which won’t work) or via a Bastion Host?

Public IP: Does the instance have a public IP address?

Instance State: Is the instance actually running? Check the EC2 console.

OS Level: Is the SSH daemon running on the instance? (This is harder to check if you can’t connect, but it’s a possibility).

28. Scenario: How would you migrate a large on-premises database to AWS with minimal downtime?

Use the AWS Database Migration Service (DMS).

First, use the Schema Conversion Tool (SCT) if you are migrating between different database engines (e.g., Oracle to PostgreSQL).

Then, set up a DMS replication instance.

DMS will do an initial full load of the data from the on-prem database to the target RDS instance.

While the full load is happening, DMS captures ongoing changes from the source database (Change Data Capture – CDC).

Once the full load is complete, DMS applies the cached changes to the target database to bring it in sync.

You can keep the replication going until you are ready to cut over. At the cutover time, you stop the application, ensure the target database is fully synced, and then point the application to the new AWS database. This minimizes the downtime to just a few minutes.


Section 7: The Behavioral Gauntlet (Leadership Principles)

If you’re interviewing at Amazon/AWS, half the battle is the behavioral questions mapped to their Leadership Principles (LPs). You MUST use the STAR method (Situation, Task, Action, Result). Be prepared with 2-3 detailed stories for each LP.

29. (Customer Obsession) Tell me about a time you went above and beyond for a customer.

Situation: A customer was experiencing intermittent performance issues with our API.

Task: My task was to identify the root cause and resolve it.

Action: I didn’t just look at our logs. I set up a call with their engineering team, got a deep understanding of their usage patterns, and then used that information to replicate their exact workload. I discovered a bottleneck in how we handled concurrent connections under a specific, rare condition. I developed a patch, tested it against their use case, and deployed it.

Result: The customer’s performance issues were completely resolved. They were impressed with the proactive engagement, which strengthened our relationship.

30. (Ownership) Describe a time you took ownership of a problem that was outside your direct responsibility.

Situation: The CI/CD pipeline for a sister team was consistently failing, blocking their releases. They were swamped, and the on-call engineer was struggling to diagnose it.

Task: Although I wasn’t on that team, our service depended on theirs, so their problem was becoming my problem. I decided to help.

Action: I spent a few hours digging through their build logs and deployment scripts. I found that a recently updated dependency was incompatible with the version of Python running in their build environment. I created a pull request to update their environment, explained the issue clearly, and walked their team lead through the fix.

Result: The pipeline was fixed, and they were able to resume deployments. It also led to us creating a shared repository of best practices for dependency management to prevent similar issues.

31. (Invent and Simplify) Tell me about a time you simplified a complex process.

Situation: Our team’s manual process for onboarding a new microservice involved over 20 steps, including creating IAM roles, setting up logging, and configuring deployment pipelines. It was slow and error-prone.

Task: I wanted to automate this process to make it faster and more reliable.

Action: I wrote a series of scripts using the AWS CLI and later converted them into a CloudFormation template. The template took a few parameters, like the service name and GitHub repo, and then automatically provisioned all the necessary infrastructure and boilerplate configuration.

Result: The onboarding process went from taking half a day of manual work to running in under 10 minutes. It eliminated human error and became the standard for all new services in our department.

32. (Dive Deep) Describe a situation where you had to dig deep to get to the root of a technical problem.

Situation: We had a memory leak in a production Java application running on EC2. The heap usage would slowly climb over several days until the application crashed and had to be restarted.

Task: Find and fix the memory leak.

Action: Standard monitoring didn’t reveal the cause. I had to get a heap dump from a production instance just before it crashed. I then used a memory analyzer tool (like Eclipse MAT) to analyze the multi-gigabyte dump file. By tracing object references, I found that a third-party caching library was not properly evicting expired entries, causing them to accumulate indefinitely. I contacted the library vendor with my findings and, in the short term, implemented a workaround that manually cleared the cache periodically.

Result: The memory leak was plugged, and the application became stable. The vendor later released a patch based on my report.

33. (Bias for Action) Tell me about a time you had to make a quick decision with limited information.

Situation: During a major release, we detected a critical bug in production that was impacting a small subset of users. We didn’t have a full root cause analysis yet.

Task: I had to decide whether to roll back the entire release, which would impact all users, or leave it in production while we worked on a hotfix.

Action: I quickly assessed the blast radius of the bug and the potential data corruption risk. The risk was low, and the impact was contained. I made the call to not roll back, but instead to assemble a war room to develop and deploy a targeted hotfix immediately. I communicated this decision and the reasoning to stakeholders.

Result: We were able to deploy a hotfix within an hour, resolving the issue for the affected users without disrupting the service for everyone else by performing a full rollback.

34. (Frugality) Tell me about a time you saved your company money.

Situation: Our development and staging environments were running 24/7 on On-Demand EC2 instances, costing thousands of dollars a month, even though they were only used during business hours.

Task: My goal was to reduce the cost of our non-production environments without impacting developer productivity.

Action: I wrote a simple Lambda function, triggered by a CloudWatch scheduled event, that would automatically stop all tagged “dev” and “staging” instances every evening and start them again in the morning.

Result: This simple automation cut the costs for those environments by over 60%, saving the company tens of thousands of dollars annually with zero impact on the development team’s workflow.

35. (Have Backbone; Disagree and Commit) Describe a time you disagreed with your manager or team.

Situation: My manager proposed a technical solution for a new feature that involved using a proprietary, expensive third-party service.

Task: I believed we could build the same functionality in-house using open-source tools and existing AWS services, which would be more cost-effective and give us more control. I needed to convince him.

Action: I didn’t just state my opinion. I did my homework. I built a small proof-of-concept over a couple of days to demonstrate the feasibility of the open-source approach. I also prepared a detailed cost analysis comparing the two options over three years. I presented my findings to my manager and the team, focusing on the data.

Result: After reviewing the data and the POC, my manager agreed that my approach was better. The team committed to the new direction. Even if he had decided to stick with his original plan, I would have fully committed to making it successful after making my case.

Avatar photo
Great Learning Editorial Team
The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.
Scroll to Top