02a-AWS
AWS (Amazon Web Services):
AWS Cloud Computing:
Six advantages of cloud computing:
Six advantages of cloud computing:
Problems solved by the cloud:
AWS Cloud Best Practices - Design Principles:
Well-Architected Framework (6 Pillars):
AWS Customer Carbon Footprint Tool: track, measure, review and forecast the Carbon emissions generated from your AWS usage. Helps you meet your own sustainability goals.
AWS Cloud Adoption Framework (AWS CAF) helps you build and then execute a comprehensive plan for your digital transformation through innovating use of AWS.
AWS CAF - Transformation Domains:
AWS Right sizing: is the process of matching instance types and sizes to your workload performance and capacity requirements at lowest possible cost.
AWS IQ: quickly find professioal help for your AWS projects. Engage and pay AWS Certified third-party experts for on-demand project work. Video-conferencing, contract management, secure collaboration, integrated billing.
AWS re:Post: AWS-managed Q&A service.
AWS Managed Services (AMS) provides infrastructure and application support on AWS. Offers a team of AWS experts who manage and operate your infrastructure for security, reliability and abailability.
AWS Regions: is a physical location around the world where we cluster data centers. We call each group of logical data centers an Availability Zone. Each AWS Region consists of a minimum of three, isolated, and physically separate AZs within a geographic area.
How to choose an AWS Region:
AWS Availability Zones (AZ): is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region. AZs give customers the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center.
AWS Local Zone location is an extension of an AWS Region where you can run your latency sensitive applications using AWS services such as Amazon Elastic Compute Cloud, Amazon Virtual Private Cloud, Amazon Elastic Block Store, Amazon File Storage, and Amazon Elastic Load Balancing in geographic proximity to end-users.
AWS Edge Locations (Point of Presence): is a site that Amazon CloudFront uses to store cached copies of your content closer to your customers for faster delivery.
AWS WaveLenght are infrastructure deplaoyments, embedded within the telecommunications providers’ datacenters at the edge of the 5G networks.
AWS Outposts are “server racks” that offers the same AWS infrastructure, services, APIs & tools to build your own applications on-premises just as in the cloud. AWS will setup and manage “Outpost racks” within your on-premises infrastructure. Customer is responsible of the Outposts Rack physical security.
Service: AWS offers a broad set of global cloud-based products including compute, storage, database, analytics, networking, machine learning and AI, mobile, developer tools, IoT, security, enterprise applications, and much more.
Tools to access AWS Services:
IAM (Identity and Access Management) enables you to securely control access to Amazon Web Services services and resources for your users.
Users are people within your organization, and can be grouped. Users don’t have to belong to a group, and user can belong to multiple groups. Groups only contain users, not other groups.
Root privileges has complete access to all AWS services and resources. Root account created by default.
Actions that can be performed only by the root user:
Policies define the permissions of the users and groups described in JSON documents.
IAM Roles for services set of permission attached to some AWS services to perform actions on your behalf (EC2, Lambda, CloudFormation).
IAM Policy is a JSON document that defines permissions. ┌─────────────────────────────────────────────────────────────┐ │ IAM POLICY │ ├─────────────────────────────────────────────────────────────┤ │ Version (Required) - Policy language version │ │ Id (Optional) - Policy identifier │ │ Statement (Required) - Array of permission blocks │ │ ├── Sid (Optional) - Statement ID │ │ ├── Effect (Required) - “Allow” or “Deny” │ │ ├── Principal (Required*) - Who the policy applies to │ │ ├── Action (Required) - What actions are permitted │ │ ├── Resource (Required) - Which resources are affected │ │ └── Condition (Optional) - When the policy applies │ └─────────────────────────────────────────────────────────────┘
Policy Types:
Best IAM practices:
IAM Credentials Report (account-level) a report that lists all your account’s users and the status of their various; IAM Access Advisor (user-level) - access advisor shows the service permissions granted to a user and when those services were last accessed. Can be used to revise policies.
AWS Resource Access Manager (AWS RAM) helps you securely share your resources across AWS accounts, within your organization or organizational units (OUs) and with IAM roles and users for supported resource types (Aurora, VPC Subnets, Transit Gateway, Route53, EC2 Dedicated Hosts, License Manager Configurations, etc). Avoid resource duplication.
AWS Service Catalog self-portal to launch a set of authorized products pre-defined by admins.
AWS STS (Security Token Service): enables you to create temporary, limited-privileges credentials to access your AWS resources.
| STS API | Use Case |
|---|---|
AssumeRole | Cross-account access, or same-account role assumption |
AssumeRoleWithSAML | Users logged in with SAML (corporate IdP) |
AssumeRoleWithWebIdentity | Users logged in with IdP (Facebook, Google, OIDC) — prefer Cognito instead |
GetSessionToken | MFA for root or IAM user |
GetFederationToken | Temporary credentials for federated user |
Session Policies: Optional policy passed when calling AssumeRole — further restricts the role’s permissions for that session only.
Amazon Cognito:
| Component | Purpose |
|---|---|
| User Pools | User directory for sign-up/sign-in, returns JWT tokens |
| Identity Pools | Exchange tokens for temporary AWS credentials (access AWS services) |
User → Cognito User Pool → JWT Token → Cognito Identity Pool → AWS Credentials → AWS Services⚠️ Exam trap: User Pools = authentication (who are you?), Identity Pools = authorization (AWS access)
IAM Access Analyzer:
AWS Directory Services:
| Service | Users Stored | On-Prem Connection | Use Case |
|---|---|---|---|
| AWS Managed Microsoft AD | In AWS | Two-way trust | Full AD features, MFA, trust with on-prem |
| AD Connector | On-prem only | Proxy (no trust) | Keep users on-prem, redirect auth |
| Simple AD | In AWS | ❌ Cannot | Basic AD, standalone, no on-prem |
┌─────────────────────────────────────────────────────────────────────────────┐
│ AWS Directory Services │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. AWS Managed Microsoft AD (two-way trust) │
│ │
│ ┌──────────┐ trust ┌──────────────────┐ │
│ auth │ │◄────────────────►│ AWS Managed AD │ auth │
│ ◄────┤ On-Prem │ │ [MS] ├────► │
│ │ AD │ └──────────────────┘ │
│ └──────────┘ │
│ │
│ 2. AD Connector (proxy only - NO users stored in AWS) │
│ │
│ ┌──────────┐ proxy ┌──────────────────┐ │
│ │ │◄────────────────►│ AD Connector │ auth │
│ │ On-Prem │ │ [⚡] ├────► │
│ │ AD │ └──────────────────┘ │
│ └──────────┘ │
│ │
│ 3. Simple AD (standalone - NO on-prem connection) │
│ │
│ ┌──────────────────┐ │
│ ❌ │ Simple AD │ auth │
│ (no on-prem) │ [DB] ├────► │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘⚠️ Exam trap: AD Connector is just a proxy — it does NOT store users, only redirects authentication to on-prem AD.
AWS Organizations (Global service):
┌─────────────────────────────────────────────────────────────────────┐
│ Root Organizational Unit (OU) │
│ ┌────────────────┐ │
│ │ Management │ ← Full admin power, SCPs do NOT apply here │
│ │ Account │ │
│ └────────────────┘ │
│ │
│ ┌──────────────────────┐ ┌──────────────────────────────────┐ │
│ │ OU (Dev) │ │ OU (Prod) │ │
│ │ ┌────┐ ┌────┐ │ │ ┌────┐ ┌────┐ │ │
│ │ │Acct│ │Acct│ │ │ │Acct│ │Acct│ │ │
│ │ └────┘ └────┘ │ │ └────┘ └────┘ │ │
│ │ Member Accounts │ │ ┌────────────┐ ┌──────────────┐ │ │
│ └──────────────────────┘ │ │ OU (HR) │ │ OU (Finance) │ │ │
│ │ │ ┌──┐ ┌──┐ │ │ ┌──┐ ┌──┐ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ └──┘ └──┘ │ │ └──┘ └──┘ │ │ │
│ │ └────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘Consolidated Billing Benefits:
Multi-Account Strategies:
Service Control Policies (SCP):
⚠️ Exam trap: SCPs don’t affect Management Account — if question asks “restrict ALL accounts”, Management Account is still unrestricted!
⚠️ Exam trap: Service-linked roles are NOT affected by SCPs — they always work!
What SCPs CANNOT do:
AWS Organizations – Tag Policies:
IAM Conditions - restrict API calls based on:
| Condition Key | Purpose | Example |
|---|---|---|
aws:SourceIp | Restrict by client IP | Only allow from corporate IP range |
aws:RequestedRegion | Restrict by region | Only allow eu-west-1 API calls |
ec2:ResourceTag | Restrict based on tags | Only manage EC2 with tag “Env=Dev” |
aws:MultiFactorAuthPresent | Force MFA | Require MFA for sensitive actions |
⚠️ Exam trap: Fake condition keys! Only these are real:
aws:RequestedRegion, aws:SourceIp, aws:SourceVpc, aws:SourceVpceaws:SourceRegion, aws:Region, ec2:SourceRegion — DON’T EXISTS3 Bucket Policies vs IAM Policies:
| Aspect | IAM Policy | S3 Bucket Policy |
|---|---|---|
| Attached to | User/Group/Role | S3 Bucket |
| Cross-account | Requires role assumption | Direct access via Principal |
| Use case | User-centric permissions | Resource-centric, public access, cross-account |
S3 Access Decision Logic:
IAM Policy ALLOWS + S3 Bucket Policy ALLOWS → ACCESS ✅
IAM Policy ALLOWS + S3 Bucket Policy (silent) → ACCESS ✅
IAM Policy (silent) + S3 Bucket Policy ALLOWS → ACCESS ✅ (if same account)
IAM Policy DENIES OR S3 Bucket Policy DENIES → DENIED ❌
Cross-account: BOTH must explicitly AllowCommon S3 Policy Conditions:
| Condition | Purpose |
|---|---|
aws:SourceIp | Restrict by IP range |
aws:SourceVpce | Restrict to specific VPC endpoint |
aws:SourceVpc | Restrict to specific VPC |
s3:x-amz-acl | Control ACL settings |
s3:x-amz-server-side-encryption | Require encryption |
aws:SecureTransport | Require HTTPS (deny HTTP) |
Example - Require HTTPS:
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::bucket/*",
"Condition": {
"Bool": { "aws:SecureTransport": "false" }
}
}⚠️ Exam trap: "Principal": "*" = anonymous access. "Principal": {"AWS": "*"} = any authenticated AWS user.
⚠️ Exam trap: Cross-account S3 access — bucket policy must explicitly allow the external principal AND the external account needs IAM permissions.
⚠️ Exam trap: S3 ARN patterns matter!
arn:aws:s3:::bucket → Bucket-level actions (ListBucket, GetBucketLocation)arn:aws:s3:::bucket/* → Object-level actions (GetObject, PutObject, DeleteObject)/* = Access Denied for object operations!IAM Roles vs Resource-Based Policies (Cross-Account Access):
Two ways to access S3 in another account:
Option 1: Role as Proxy (AssumeRole)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ User │─────►│ Role │─────►│ Amazon S3 │
│ Account A │ │ Account B │ │ Account B │
└──────────────┘ └──────────────┘ └──────────────┘
(become this role,
lose Account A perms)
Option 2: Resource-Based Policy (S3 Bucket Policy)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ User │─────►│ S3 Bucket │─────►│ Amazon S3 │
│ Account A │ │ Policy │ │ Account B │
└──────────────┘ └──────────────┘ └──────────────┘
(grants access to
Account A user directly,
keeps Account A perms)| Aspect | Assume Role | Resource-Based Policy |
|---|---|---|
| Permissions | Give up original, take role’s | Keep original + gain resource access |
| Use case | Need full different identity | Need BOTH source and target access |
Example: User in Account A needs to scan DynamoDB in Account A AND dump to S3 in Account B → Use resource-based policy on S3 (keeps DynamoDB permissions)
EventBridge Target Permissions:
| Target | Policy Type | Why |
|---|---|---|
| Lambda | Resource-based | Lambda can define “who invokes me” |
| SNS | Resource-based | SNS can define “who publishes to me” |
| SQS | Resource-based | SQS can define “who sends to me” |
| S3 | Resource-based | S3 can define “who writes to me” |
| API Gateway | Resource-based | API GW can define “who calls me” |
| Kinesis | IAM Role | No invoke policy — need role |
| EC2 Auto Scaling | IAM Role | No invoke policy — need role |
| ECS Task | IAM Role | No invoke policy — need role |
| SSM Run Command | IAM Role | No invoke policy — need role |
Memory trick: “Can the target say WHO is allowed to invoke it?”
Memory hook: “SLSS + API GW” = Resource-based (SNS, Lambda, SQS, S3, API Gateway) Memory hook: “KEES” = IAM Role needed (Kinesis, EC2 Auto Scaling, ECS, SSM)
⚠️ Exam trap: Lambda = resource-based, Kinesis = IAM role. Don’t mix them up!
IAM Permission Boundaries:
┌───────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │Organizations│ │ Permissions │ │
│ │ SCP │ │ Boundary │ │
│ │ ✓ │ ✓ │ ✓ │ │
│ │ ┌───┴─────────┴───┐ │ │
│ │ │ │ │ │
│ └─────────┤ Effective ├─────────────┘ │
│ │ Permissions │ │
│ ┌─────────┤ ✓ ├─────────────┐ │
│ │ │ │ │ │
│ │ └───┬─────────────┘ │ │
│ │ Identity │ ✓ │ │
│ │ Policy │ │ │
│ │ ✓ │ │ │
│ └─────────────┘ │ │
│ │
└───────────────────────────────────────────────────────────┘Use Cases:
⚠️ Exam trap: Permission Boundaries do NOT apply to groups! Only users and roles.
IAM Policy Evaluation Logic (order matters):
┌─────────────────┐
│ Start: DENY │
└────────┬────────┘
▼
┌─────────────────────────┐
│ Explicit Deny? │──── YES ────► DENY ❌
└─────────────┬───────────┘
│ NO
▼
┌─────────────────────────┐
│ In Org with SCP? │──── NO ─────► Skip to Resource-Based
└─────────────┬───────────┘
│ YES
▼
┌─────────────────────────┐
│ SCP Allows? │──── NO ─────► DENY ❌ (implicit)
└─────────────┬───────────┘
│ YES
▼
┌─────────────────────────┐
│ Resource-Based │──── ALLOW + same account ──► ALLOW ✅
│ Policy Allows? │
└─────────────┬───────────┘
│ (continue if cross-account or no resource policy)
▼
┌─────────────────────────┐
│ Identity Policy │──── NO ─────► DENY ❌ (implicit)
│ Allows? │
└─────────────┬───────────┘
│ YES
▼
┌─────────────────────────┐
│ Permission Boundary │──── NO ─────► DENY ❌ (implicit)
│ Allows? (if exists) │
└─────────────┬───────────┘
│ YES
▼
┌─────────────────────────┐
│ Session Policy │──── NO ─────► DENY ❌ (implicit)
│ Allows? (if exists) │
└─────────────┬───────────┘
│ YES
▼
┌─────────────────┐
│ ALLOW ✅ │
└─────────────────┘Key Rules:
AWS IAM Identity Center (successor to AWS SSO):
Active Directory Integration:
Option 1: AWS Managed Microsoft AD (out-of-box integration)
┌──────────────────┐ ┌─────────────────────┐
│ IAM Identity │───connect───►│ AWS Managed │
│ Center │ │ Microsoft AD │
└──────────────────┘ └─────────────────────┘
Option 2: Self-Managed AD (two approaches)
┌──────────────────┐ ┌─────────────────┐ two-way trust ┌────────────┐
│ IAM Identity │─────│ AWS Managed │◄──────────────────►│ On-Prem AD │
│ Center │ │ Microsoft AD │ └────────────┘
└────────┬─────────┘ └─────────────────┘
│
│ ┌─────────────────┐ proxy ┌────────────┐
└───────────────│ AD Connector │◄────────────────────►│ On-Prem AD │
└─────────────────┘ └────────────┘Permission Sets: Collection of IAM Policies assigned to users/groups for AWS access ABAC: Fine-grained permissions based on user attributes (cost center, title, locale)
AWS Control Tower - multi-account governance:
Guardrails (ongoing governance):
| Type | Implementation | Example |
|---|---|---|
| Preventive | SCPs | Restrict regions across all accounts |
| Detective | AWS Config | Identify untagged resources |
Detective Guardrail Flow:
┌─────────────────────────────────────────────────────────────────────┐
│ AWS Control Tower │
│ ┌─────────────┐ │
│ │ Guardrail │ trigger │
│ │ (Detective) ├───────────►┌─────┐ notify ┌───────┐ │
│ │ │(NON_COMPLIANT)│ SNS │────────►│ Admin │ │
│ │ AWS Config │ └──┬──┘ └───────┘ │
│ └──────┬──────┘ │ │
│ │ monitor │ invoke │
│ ▼ ▼ │
│ ┌────────────────┐ ┌──────────┐ │
│ │ Member Accounts│◄──────│ Lambda │ remediate (add tags) │
│ └────────────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────────┘| Trap | Reality |
|---|---|
| “IAM is regional” | ❌ IAM is GLOBAL — no region selection |
| “SCPs restrict Management Account” | ❌ Management Account has full power always |
| “SCPs affect service-linked roles” | ❌ Service-linked roles are NOT affected |
| “Permission Boundaries work on groups” | ❌ Only users and roles |
| “AssumeRole keeps original permissions” | ❌ You give up original, take role’s |
| “Resource-based policy requires role assumption” | ❌ No role needed — keep original permissions |
| “Cognito User Pools give AWS access” | ❌ User Pools = auth only; Identity Pools = AWS credentials |
| “S3 GetObject on bucket ARN” | ❌ Need bucket/* for object actions, bucket alone = Access Denied |
“aws:SourceRegion condition key” | ❌ Doesn’t exist — use aws:RequestedRegion |
| “EventBridge → Kinesis uses resource policy” | ❌ Kinesis needs IAM Role (no resource-based policy) |
| Scenario | → Solution |
|---|---|
| Restrict entire AWS account | SCP (at OU or Account level) |
| Restrict specific user/role (not whole account) | Permission Boundary |
| Cross-account, keep original permissions | Resource-based policy |
| Cross-account, need different identity | Assume Role |
| Centralized login for multiple AWS accounts | IAM Identity Center |
| External users (millions, mobile/web) | Cognito |
| Temporary AWS credentials | STS |
| Corporate IdP integration (SAML) | IAM Identity Center or AssumeRoleWithSAML |
| Social login (Google, Facebook) | Cognito Identity Pools |
| Share resources across accounts | AWS RAM |
| Find externally shared resources | IAM Access Analyzer |
| Multi-account governance with guardrails | AWS Control Tower |
| Standardize tags across Organization | Tag Policies |
Everything starts DENIED. You must explicitly Allow. This applies to:
Why? Security principle — if you forget something, it’s denied (safe default).
Derive: If SCP only allows EC2 → everything else is denied. No need to memorize “deny lists.”
No matter how many Allows exist, one Deny = blocked.
Why? Prevents privilege escalation — you can always restrict, never override a restriction.
Derive: To block an action, just add Deny anywhere in the chain. Order doesn’t matter for Deny.
Effective permissions = what ALL layers allow together.
SCP ∩ Permission Boundary ∩ Identity Policy = Effective PermissionsWhy? Each layer is a guardrail — you can only narrow, never expand beyond any layer.
Derive: If SCP allows S3+EC2, but Identity Policy only allows S3 → only S3 works.
SCPs never apply to Management Account. It always has full power.
Why? Someone must be able to fix things if SCPs lock everyone out.
Derive: “Restrict ALL accounts” questions — Management Account is the exception.
Why? Different granularity needs different tools.
Derive: “Prevent developers from escalating privileges” → Permission Boundary (user-level, not account-level).
Why? Sometimes you need both source AND target access (e.g., DynamoDB scan → S3 dump).
Derive: “Access DynamoDB in Account A AND S3 in Account B” → Resource-based policy on S3.
STS provides temporary, auto-expiring credentials.
Why? Reduces blast radius of compromise — credentials expire.
Derive: Cross-account access, federation, MFA → all use STS behind the scenes.
Why? Separation of concerns — different systems handle different problems.
Derive: “Millions of mobile users need S3 access” → User Pool (auth) + Identity Pool (AWS creds).
SCPs don’t affect them. They’re created BY AWS FOR AWS services.
Why? AWS services need guaranteed permissions to function.
Derive: “SCP blocks everything but service still works” → It’s using a service-linked role.
No region selection. Users, roles, policies exist everywhere.
Why? Identity should be consistent — you’re YOU regardless of region.
Derive: “Create IAM user in us-east-1” → Trick question, IAM has no region.
Need cross-account access?
│
├─► Need BOTH source + target permissions?
│ │
│ └─► YES → Resource-based policy
│
└─► Need different identity/permissions?
│
└─► YES → Assume RoleWhat to restrict?
│
├─► Entire account(s)?
│ │
│ └─► SCP (attach to OU or Account)
│
├─► Specific user/role?
│ │
│ └─► Permission Boundary
│
└─► Specific actions with conditions?
│
└─► IAM Policy with ConditionsWho needs access?
│
├─► Internal employees to AWS accounts?
│ │
│ └─► IAM Identity Center
│
├─► External users (millions, mobile/web)?
│ │
│ └─► Cognito
│
└─► Corporate IdP (SAML)?
│
├─► To AWS Console → IAM Identity Center
└─► Programmatic → AssumeRoleWithSAML| What | Cannot |
|---|---|
| SCPs | Affect Management Account |
| SCPs | Affect service-linked roles |
| Permission Boundaries | Apply to groups |
| Cognito User Pools | Give AWS credentials directly |
| Simple AD | Join with on-premises AD |
| AD Connector | Store users (it’s just a proxy) |
Keywords: all accounts, prevent, organization-wide, block service Answer: SCP at Root OU level Why: SCPs cascade down OUs. Root OU = all member accounts (not Management).
Keywords: delegate, self-service, prevent escalation, limit what they can create Answer: Permission Boundary Why: Boundary limits max permissions of created entities.
Keywords: scan + dump, read from A write to B, both accounts Answer: Resource-based policy (on target resource) Why: AssumeRole would lose access to source account.
Keywords: mobile, web app, millions, external users, S3/DynamoDB access Answer: Cognito User Pools + Identity Pools Why: User Pools authenticate, Identity Pools give temporary AWS credentials.
Keywords: SSO, single sign-on, multiple accounts, employees, SAML, Active Directory Answer: IAM Identity Center Why: Built for this — integrates with AD, manages permission sets across accounts.
Keywords: detect, compliance, untagged, non-compliant, monitor Answer: Control Tower Detective Guardrail (uses AWS Config) Why: Detective = monitoring (not blocking). Uses Config rules.
Keywords: prevent, block, region restriction, all accounts
Answer: SCP with aws:RequestedRegion condition (or Control Tower Preventive Guardrail)
Why: Preventive = blocking. SCPs stop the action.
Keywords: share, subnets, cross-account, Transit Gateway, avoid duplication Answer: AWS RAM (Resource Access Manager) Why: RAM shares resources without duplication.
Keywords: identify, find, external access, shared externally, audit Answer: IAM Access Analyzer Why: Analyzes policies to find external principal access.
Keywords: temporary, cross-account, assume, programmatic Answer: STS AssumeRole Why: Returns temporary credentials for the target role.
Keywords: MFA, multi-factor, sensitive, delete, critical
Answer: IAM Policy Condition: aws:MultiFactorAuthPresent
Why: Condition key checks MFA status.
Keywords: standardize, tags, enforce, format, organization-wide Answer: Tag Policies Why: Define allowed tag keys/values, prevent non-compliant tags.
Keywords: on-premises, Active Directory, Identity Center, trust Answer: Two-way trust with AWS Managed Microsoft AD, OR AD Connector (proxy) Why: Can’t connect directly — need AWS AD service in between.
| Aspect | SCP | Permission Boundary | IAM Policy |
|---|---|---|---|
| Scope | Account/OU | User/Role | User/Group/Role |
| Applies to groups? | N/A (account level) | ❌ NO | ✅ YES |
| Affects root user? | ✅ YES | ❌ NO (root has no boundary) | ❌ NO |
| Affects Management Account? | ❌ NO | ✅ YES | ✅ YES |
| Default | Implicit Deny | Implicit Deny | Implicit Deny |
| Service | Users Stored | On-Prem Connection | Use Case |
|---|---|---|---|
| AWS Managed Microsoft AD | In AWS | Two-way trust | Need AD features in AWS |
| AD Connector | On-prem only | Proxy | Keep users on-prem |
| Simple AD | In AWS | ❌ Cannot | Basic AD, no on-prem |
| API | When to Use |
|---|---|
| AssumeRole | Cross-account, same-account role switch |
| AssumeRoleWithSAML | Corporate IdP (SAML) login |
| AssumeRoleWithWebIdentity | Social login (prefer Cognito) |
| GetSessionToken | MFA for IAM user |
| GetFederationToken | Custom federation |
| Question Contains | → Instant Answer |
|---|---|
| “restrict ALL member accounts” | SCP at Root OU |
| “Management Account” + “restrict” | ❌ Can’t — SCPs don’t apply |
| “service-linked role” + “blocked” | ❌ Can’t — SCPs don’t affect |
| “prevent privilege escalation” | Permission Boundary |
| “groups” + “permission boundary” | ❌ Not supported |
| “both accounts” / “source and target” | Resource-based policy |
| “give up permissions” | Assume Role |
| “millions of users” / “mobile app” | Cognito |
| “User Pools” + “AWS access” | ❌ Need Identity Pools too |
| “SSO multiple accounts” | IAM Identity Center |
| “on-premises AD” + “Identity Center” | AWS Managed AD + trust, or AD Connector |
| “Simple AD” + “on-prem” | ❌ Cannot connect |
| “share resources” / “avoid duplication” | AWS RAM |
| “find external access” | IAM Access Analyzer |
| “temporary credentials” | STS |
| “MFA required” | aws:MultiFactorAuthPresent condition |
| “region restriction” | aws:RequestedRegion condition or SCP |
| “tag standardization” | Tag Policies |
| “detect” + “compliance” | Detective Guardrail (AWS Config) |
| “prevent” + “organization-wide” | Preventive Guardrail (SCP) |
| “Control Tower” + “remediate” | Lambda (triggered by SNS from Config) |
| “landing zone” | Control Tower |
| “SAML” + “programmatic” | AssumeRoleWithSAML |
| “social login” | Cognito Identity Pools |
| “IAM” + “regional” | ❌ Trick — IAM is GLOBAL |
□ Is it about restricting accounts?
→ Yes = Think SCP
→ But Management Account? = SCP won't work
□ Is it about restricting a specific user/role?
→ Yes = Think Permission Boundary
→ Is it a group? = Permission Boundary won't work
□ Is it cross-account access?
→ Need both source + target access? = Resource-based policy
→ Need different identity? = Assume Role
□ Is it millions of external users?
→ Yes = Cognito (not IAM users)
→ Need AWS credentials? = Identity Pools required
□ Is it corporate employees to AWS?
→ Yes = IAM Identity Center
□ Is it about compliance/detection?
→ Yes = Detective Guardrail / AWS Config
□ Is it about blocking/prevention?
→ Yes = Preventive Guardrail / SCPVPC (Virtual Private Cloud) is a service that lets you launch AWS resources in a logically isolated virtual network that you define.
10.0.0.0/8 (10.0.0.0 – 10.255.255.255)172.16.0.0/12 (172.16.0.0 – 172.31.255.255)192.168.0.0/16 (192.168.0.0 – 192.168.255.255)CIDR (Classless Inter-Domain Routing) — method for allocating IP addresses. Used in Security Groups, NACLs, and all AWS networking.
| CIDR | IPs | Use Case |
|---|---|---|
/32 | 1 | Single IP (e.g., SSH from your IP) |
/28 | 16 | Smallest VPC/subnet |
/27 | 32 | Small subnet |
/26 | 64 | Medium subnet |
/24 | 256 | Common subnet size |
/16 | 65,536 | Largest VPC |
/0 | All | 0.0.0.0/0 = open to internet |
⚠️ Exam trap: “Need 29 IPs for EC2” → /27 (32 IPs) is NOT enough! AWS reserves 5 IPs per subnet (first 4 + last 1) → 32 - 5 = 27 < 29. Use /26 (64 - 5 = 59 ✓)
AWS Reserved IPs per subnet (5):
.0 — Network Address.1 — VPC Router.2 — Amazon DNS.3 — Future use.255 — Broadcast (not supported, but reserved)IPv6 in VPC:
⚠️ Exam trap: “Can’t launch EC2 in subnet” → NOT because of IPv6 (space is huge). It’s because no available IPv4 in the subnet → solution: create a new IPv4 CIDR in your subnet.
Subnets partition your network inside VPC.
Internet gateway helps VPC instances connect with the internet (public subnets have a route to IGW).
NAT allows instances in private subnets to access the internet while remaining private.
Private Subnet EC2 ──► NAT (Public Subnet) ──► IGW ──► Internet
10.0.0.20 EIP: 12.34.56.78 ▲
(translates src IP) │
│
Response comes back to NAT ◄────┘
NAT forwards to 10.0.0.20NAT Instance (self-managed, outdated but still on exam):
NAT Gateway (AWS-managed):
NAT Gateway HA: Resilient within single AZ only → create one NATGW per AZ for fault tolerance. No cross-AZ failover needed.
NAT Gateway vs NAT Instance:
| Feature | NAT Gateway | NAT Instance |
|---|---|---|
| Availability | HA within AZ (create in each AZ) | Manual failover (ASG + script) |
| Bandwidth | Up to 100 Gbps | Depends on instance type |
| Maintenance | AWS-managed | You manage (patching, OS) |
| Cost | Per hour + data transferred | Per hour + instance type + network |
| Public IPv4 | ✅ | ✅ |
| Private IPv4 | ✅ | ✅ |
| Security Groups | ❌ No | ✅ Yes |
| Bastion Host | ❌ No | ✅ Yes |
| Port Forwarding | ❌ No | ✅ Yes (iptables) |
⚠️ Exam trap: “NAT + Security Groups” → NAT Instance (NAT Gateway has NO SGs). “NAT + Bastion Host” → also NAT Instance.
⚠️ Exam trap: “Private instances need internet, managed, HA” → NAT Gateway. NAT Instance is legacy — only pick it if question says “existing NAT Instance” or needs SG/Bastion.
Bastion Host = EC2 instance in public subnet used to SSH into private instances.
Users ──SSH (port 22)──► Bastion Host (Public Subnet) ──SSH──► EC2 (Private Subnet)
BastionHost-SG LinuxInstance-SG
Inbound: port 22 Inbound: port 22
from corp CIDR from BastionHost-SG⚠️ Exam trap: “SSH into private EC2” → Bastion Host (or SSM Session Manager for no-SSH approach). NOT NAT Gateway — NAT is for outbound internet only.
Network ACL (NACL) — firewall at subnet level. Can have ALLOW and DENY rules. Rules only include IP addresses. Automatically applies to all instances in the subnet. STATELESS: Return traffic must be explicitly allowed. Checks packets both ways.
Security Groups — firewall at instance (ENI) level. Can have only ALLOW rules. Rules include IP addresses and other security groups. STATEFUL: Return traffic is automatically allowed.
Inbound port 22 allowed: Outside ──SSH──► Your EC2 ✅ (response auto-allowed out)
Outbound port 22 allowed: Your EC2 ──SSH──► Outside ✅ (response auto-allowed in)⚠️ Exam trap: “Default NACL” → allows all traffic. “Custom NACL” → denies all by default. Don’t confuse them!
| Feature | Security Group | NACL |
|---|---|---|
| Level | Instance (ENI) | Subnet |
| Rules | Allow only | Allow AND Deny |
| State | Stateful (return auto-allowed) | Stateless (must allow both directions) |
| Rule Evaluation | All rules evaluated together | Rules processed in order (lowest # first, first match wins) |
| Association | Manually assigned to instance | Automatically applies to all instances in subnet |
| Default | Deny all inbound, allow all outbound | Allow ALL traffic |
Ephemeral Ports:
Client (11.22.33.44) Web Server (55.66.77.88)
──► Src Port: 50105, Dest Port: 443 ──► (fixed port 443)
◄── Dest Port: 50105, Src Port: 443 ◄── (response to ephemeral port)NACL with Ephemeral Ports — Example (Web → DB):
Web Subnet (Public) DB Subnet (Private)
┌──────────────────┐ ┌─────────────────┐
│ EC2 (Web) │ │ RDS (port 3306) │
│ │ │ │
└────────┬─────────┘ └────────┬────────┘
│ │
Web-NACL DB-NACL
┌───────────────┐ ┌───────────────┐
│ OUTBOUND: │ ── request ──► │ INBOUND: │
│ port 3306 │ │ port 3306 │
│ to DB CIDR │ │ from Web CIDR│
│ │ │ │
│ INBOUND: │ ◄── response ── │ OUTBOUND: │
│ port 1024- │ │ port 1024- │
│ 65535 │ │ 65535 │
│ from DB CIDR │ │ to Web CIDR │
└───────────────┘ └───────────────┘4 NACL rules needed (because stateless = each direction, each NACL):
| NACL | Direction | Port | CIDR | Why |
|---|---|---|---|---|
| Web-NACL | Outbound | 3306 | DB Subnet CIDR | Web initiates DB connection |
| Web-NACL | Inbound | 1024-65535 | DB Subnet CIDR | DB response on ephemeral port |
| DB-NACL | Inbound | 3306 | Web Subnet CIDR | Accept DB connection from Web |
| DB-NACL | Outbound | 1024-65535 | Web Subnet CIDR | Send response on ephemeral port |
⚠️ Exam trap: With SGs you’d only need 2 rules (allow 3306 each side) — stateful handles the rest. With NACLs you need 4 rules — don’t forget the ephemeral port rules for return traffic!
⚠️ Exam trap: “NACL blocking return traffic” → you forgot to allow ephemeral ports outbound (server side) or inbound (client side). SGs don’t have this problem (stateful).
VPC Flow Logs capture information about IP traffic going into your interfaces:
Flow Log Syntax:
version account-id interface-id srcaddr dstaddr srcport dstport packets bytes start end protocol action log-status
2 123456789010 eni-1235b8ca srcIP dstIP 20641 22 6 20 4249 ... ACCEPT OK
2 123456789010 eni-1235b8ca srcIP dstIP 49761 3389 6 20 4249 ... REJECT OKKey fields:
ACCEPT or REJECT (due to SG or NACL)Troubleshoot SG vs NACL using Flow Logs (ACTION field):
| Scenario | Inbound | Outbound | Blocked by |
|---|---|---|---|
| Incoming blocked | REJECT | — | NACL or SG |
| Incoming allowed, response blocked | ACCEPT | REJECT | NACL (SG is stateful → would auto-allow) |
| Outgoing blocked | — | REJECT | NACL or SG |
| Outgoing allowed, response blocked | ACCEPT | REJECT | NACL |
Memory trick: “ACCEPT then REJECT” = always NACL (stateless blocks return traffic). SG would never block return traffic (stateful).
Flow Logs Architectures:
VPC Peering — privately connect two VPCs using AWS’ network, behave as if same network.
VPC-A ◄──Peering──► VPC-B ◄──Peering──► VPC-C
│ │
└──────────Peering (A↔C needed!)─────────┘
(B does NOT relay traffic)⚠️ Exam trap: “VPC A peers with B, B peers with C, can A talk to C?” → NO! Not transitive. Need separate A↔C peering. If you need many VPCs connected → use Transit Gateway instead.
VPC Endpoints connect to AWS services using private network instead of public internet.
Two types:
| Feature | Interface Endpoint | Gateway Endpoint |
|---|---|---|
| How | Provisions an ENI (private IP) | Target in Route Table |
| Security Group | ✅ Must attach | ❌ No |
| Services | Most AWS services | S3 and DynamoDB only |
| Cost | $ per hour + $ per GB | Free |
| Access from on-prem | ✅ (via VPN/DX) | ❌ No |
| Powered by | AWS PrivateLink | Route Table entry |
Option 1 (costly): Lambda (VPC) ──► NAT GW ──► IGW ──► DynamoDB (public)
Option 2 (free/better): Lambda (VPC) ──► Gateway Endpoint ──► DynamoDB (private)⚠️ Exam trap: “S3 or DynamoDB access from VPC” → Gateway Endpoint (free, preferred on exam). Interface Endpoint only when access needed from on-premises (VPN/Direct Connect), different VPC, or different region.
⚠️ Exam trap: “Lambda in VPC can’t reach DynamoDB” → either add NAT GW + IGW, or (better) use VPC Gateway Endpoint for DynamoDB.
⚠️ Exam trap - “VPC resources access SQS/SNS/KMS privately (no internet)”:
AWS PrivateLink — expose a service in your VPC to other VPCs privately.
Site-to-Site VPN — encrypted connection between on-premises and AWS over the public internet.
Components:
Setup:
AWS VPN CloudHub:
⚠️ Exam trap: “Ping EC2 from on-premises doesn’t work” → check ICMP allowed in SG inbound + Route Propagation enabled.
Direct Connect — dedicated private physical connection from on-premises to AWS.
Virtual Interfaces (VIFs):
Corporate DC ──► Customer Router ──► DX Endpoint ──► VPG ──► VPC
(DX Location)
VLAN 1 (Private VIF) ──► EC2
VLAN 2 (Public VIF) ──► S3, GlacierConnection Types:
| Type | Speed | Details |
|---|---|---|
| Dedicated | 1 / 10 / 100 Gbps | Physical port dedicated to you. Request via AWS first |
| Hosted | 50 Mbps – 10 Gbps | Via AWS Direct Connect Partners. Capacity on demand |
Direct Connect Gateway:
Resiliency:
Backup:
⚠️ Exam trap: “Private, dedicated, consistent connection” → Direct Connect. “Encrypted over internet” → Site-to-Site VPN. DX is NOT encrypted by default (add VPN on top for encryption).
⚠️ Exam trap: “Improve connection within days/1 week” → NOT Direct Connect (takes > 1 month). Use Site-to-Site VPN for quick setup. DX is only the answer when time is not a constraint.
AWS Client VPN — connect end-devices (laptops) to AWS or on-premises via OpenVPN over the internet. Access EC2 using private IP.
Transit Gateway — transitive peering hub for thousands of VPCs and on-premises (hub-and-spoke / star topology).
┌──► VPC-A
│
Corporate DC ──► Transit Gateway ──► VPC-B
(VPN/DX) │
├──► VPC-C
│
└──► VPC-DWithout TGW: complex mesh of VPC peering + VPN connections (N² connections) With TGW: single hub, all spokes connect to it (N connections)
ECMP (Equal-Cost Multi-Path Routing):
| Setup | Throughput |
|---|---|
| VPN → VGW (1 connection, 2 tunnels) | 1.25 Gbps |
| VPN → TGW (1 connection, ECMP) | 2.5 Gbps (both tunnels used) |
| 2× VPN → TGW (ECMP) | 5.0 Gbps |
| 3× VPN → TGW (ECMP) | 7.5 Gbps |
Share DX across accounts:
⚠️ Exam trap: “Connect many VPCs + on-premises, simplify topology” → Transit Gateway. NOT VPC Peering (not transitive, mesh complexity).
⚠️ Exam trap: “Increase VPN bandwidth to AWS” → multiple VPN connections + Transit Gateway with ECMP. VGW limited to 1.25 Gbps.
Traffic Mirroring — capture and inspect network traffic in your VPC.
Source A (ENI) ──┐
├──► Traffic Mirroring ──► NLB ──► ASG (Security Appliances)
Source B (ENI) ──┘ (filter optional)Egress-only IGW — like a NAT Gateway, but for IPv6.
IPv4 outbound: Private EC2 ──► NAT Gateway ──► IGW ──► Internet
IPv6 outbound: EC2 ──► Egress-only IGW ──► Internet (no inbound initiated)⚠️ Exam trap: “IPv6 instances need outbound internet but block inbound” → Egress-only IGW (NOT NAT Gateway — NAT is for IPv4 only).
Core principle: Ingress is free, egress costs money. Keep traffic inside AWS to minimize costs.
EC2 Data Transfer Costs (per GB):
| Traffic Path | Cost |
|---|---|
| Traffic in to EC2 (ingress) | Free |
| Same AZ, private IP | Free |
| Same AZ, public/Elastic IP | $0.02 |
| Cross-AZ, private IP | $0.01 |
| Cross-region | $0.02 |
Cost optimization tips:
⚠️ Exam trap: “Lowest egress cost” with Direct Connect available
NAT Gateway vs VPC Gateway Endpoint (for S3):
| Path | Cost |
|---|---|
| EC2 → NAT GW → IGW → S3 | $0.045/hr + $0.045/GB + $0.09/GB cross-region |
| EC2 → Gateway Endpoint → S3 | Free (endpoint) + $0.01/GB same-region |
Subnet 1: EC2 ──► NAT GW ──► IGW ──► S3 (costly: ~$0.09/GB)
Subnet 2: EC2 ──► VPC Gateway Endpoint ──► S3 (free endpoint, ~$0.01/GB)⚠️ Exam trap: “Reduce cost of S3 access from VPC” → Gateway Endpoint (free, no NAT GW charges). Route table entry with pl-id for Amazon S3 → vpce-id.
S3 Data Transfer Pricing (USA):
| Path | Cost/GB |
|---|---|
| S3 ingress (upload) | Free |
| S3 → Internet | $0.09 |
| S3 Transfer Acceleration | +$0.04 to $0.08 on top |
| S3 → CloudFront | Free |
| CloudFront → Internet | $0.085 (slightly cheaper than S3 direct) |
| S3 Cross-Region Replication | $0.02 |
⚠️ Exam trap: “Deliver S3 content to users cheaply” → CloudFront ($0.085/GB vs $0.09/GB direct) + caching + 7x cheaper S3 request pricing.
AWS Network Firewall — protect your entire VPC, Layer 3 to Layer 7.
Fine-Grained Controls:
*.mycorp.com outbound⚠️ Exam trap: “Sophisticated VPC-wide network protection, Layer 3-7, inspect all traffic directions” → AWS Network Firewall. NOT just NACLs/SGs (those are basic). NOT WAF (WAF is Layer 7 HTTP only).
Traffic doesn’t flow just because resources exist — Route Tables are the backbone. IGW, NAT, VPC Peering, Endpoints — none work without correct route table entries. If connectivity fails, check routes first.
Key insight: Most “can’t connect” troubleshooting answers involve Route Tables, SGs, or NACLs.
A subnet is “public” only because its route table has 0.0.0.0/0 → igw-id. There’s no checkbox. No route to IGW = private subnet, regardless of what you call it.
Derivation: If the exam says “ACCEPT then REJECT in flow logs” → always NACL. SGs never block return traffic.
Private instances can’t reach internet directly. They need a “translator” in a public subnet:
Instead of routing through IGW to reach AWS services, use VPC Endpoints:
Derivation: “Reduce cost of S3 access” or “private access to S3” → Gateway Endpoint.
Three options, each with trade-offs:
VPC Peering is point-to-point. A↔B and B↔C does NOT mean A↔C. For hub connectivity → Transit Gateway.
TGW solves three problems: (1) transitive routing, (2) VPN bandwidth scaling (ECMP), (3) sharing DX across accounts. If question mentions “many VPCs” or “simplify network” → TGW.
Layer 3-4: NACLs (subnet) → Security Groups (ENI)
Layer 7: WAF (HTTP/HTTPS only)
Layer 3-7: AWS Network Firewall (entire VPC, all directions)
Cross-acct: AWS Firewall Manager (centralize rules)All AWS networking pricing follows this: data IN = free, data OUT = costs. Minimize egress by keeping processing inside AWS and using private IPs.
Need to connect to AWS?
│
├─ From on-premises SITE?
│ ├─ Need it NOW (days)? ──► Site-to-Site VPN
│ ├─ Need dedicated/private/consistent? ──► Direct Connect
│ ├─ Need encryption on DX? ──► VPN on top of DX
│ ├─ Multiple sites to connect? ──► VPN CloudHub
│ └─ Need DX to multiple regions? ──► DX Gateway
│
├─ From individual LAPTOP?
│ └─► AWS Client VPN (OpenVPN)
│
├─ VPC to VPC?
│ ├─ Just 2 VPCs? ──► VPC Peering
│ ├─ Many VPCs (hub-and-spoke)? ──► Transit Gateway
│ └─ Expose specific service? ──► PrivateLink (NLB + ENI)
│
└─ VPC to AWS Service (S3, DynamoDB, etc.)?
├─ S3 or DynamoDB? ──► Gateway Endpoint (free)
├─ Other service? ──► Interface Endpoint
└─ Need on-prem access too? ──► Interface EndpointNeed to control traffic?
│
├─ At instance/ENI level? ──► Security Group (ALLOW only, stateful)
├─ At subnet level? ──► NACL (ALLOW + DENY, stateless)
├─ Block specific IPs (Layer 3)? ──► NACL (has DENY rules)
├─ Block HTTP patterns/SQL injection? ──► WAF (Layer 7)
├─ VPC-wide, all directions, L3-L7? ──► AWS Network Firewall
└─ Centralize across accounts? ──► AWS Firewall Manager| You CANNOT… | Why |
|---|---|
| Disable IPv4 in VPC | VPC requires IPv4; IPv6 is optional dual-stack |
| Use NAT Gateway as Bastion | NAT GW doesn’t support SSH — use NAT Instance |
| Attach >1 IGW per VPC | 1:1 mapping only |
| Use Gateway Endpoint from on-prem | Gateway Endpoint = route table only; use Interface Endpoint |
| Make VPC Peering transitive | Need separate peering per pair, or use TGW |
| Encrypt DX natively | Add VPN on top for encryption |
| Set up DX in under 1 month | Lead time >1 month; use VPN for quick setup |
| Have VPC CIDR larger than /16 | Max VPC size = /16 (65,536 IPs) |
| Attach SG to NAT Gateway | NAT GW has no SGs — only NAT Instance does |
| Use VGW for >1.25 Gbps VPN | Need TGW + ECMP for higher throughput |
Keywords: private subnet, internet, managed, scales Answer: NAT Gateway Why: AWS-managed, auto-scales to 100 Gbps, no SG/patching needed
Keywords: SSH, private subnet, access, developers Answer: Bastion Host (or SSM Session Manager) Why: Bastion in public subnet acts as SSH jump box. SG: port 22 from corporate public CIDR
Keywords: flow logs, allowed then blocked, return traffic Answer: NACL is blocking (not SG) Why: SGs are stateful — they never block return traffic. Only NACLs (stateless) do this
Keywords: launch failure, subnet, no capacity Answer: No available IPv4 addresses → add new CIDR Why: IPv6 space is huge; the bottleneck is always IPv4
Keywords: VPC peering, transitive, multiple VPCs Answer: NO — VPC Peering is not transitive Why: Need A↔C peering, or use Transit Gateway
Keywords: many VPCs, hub-and-spoke, simplify, on-premises Answer: Transit Gateway Why: Single hub, N connections instead of N² mesh
Keywords: VPN throughput, scale bandwidth, more than 1.25 Answer: Transit Gateway with ECMP + multiple VPN connections Why: VGW uses only 1 tunnel (1.25 Gbps); TGW uses both (2.5 Gbps) and stacks connections
Keywords: dedicated, private, consistent, not internet Answer: Direct Connect Why: Physical private connection, doesn’t traverse internet
Keywords: quickly, fast setup, days, immediately Answer: Site-to-Site VPN (NOT Direct Connect) Why: DX takes >1 month to establish
Keywords: Direct Connect fails, backup, cheap Answer: Site-to-Site VPN as backup Why: Second DX is expensive; VPN is cheap and quick
Keywords: S3, DynamoDB, private access, no internet, reduce cost Answer: VPC Gateway Endpoint (free) Why: Free, route table entry, no NAT GW charges
Keywords: on-premises, AWS service, private access, VPN, Direct Connect Answer: VPC Interface Endpoint (not Gateway) Why: Gateway Endpoints can’t be accessed from on-prem
Keywords: multiple sites, hub-and-spoke, VPN, backup Answer: AWS VPN CloudHub Why: Multiple VPN connections on same VGW, over public internet
Keywords: expose, service, private, cross-VPC, cross-account Answer: AWS PrivateLink (NLB + ENI) Why: No peering, no IGW, no routes needed
Keywords: IPv6, outbound, prevent inbound, internet Answer: Egress-only Internet Gateway Why: NAT is IPv4 only; Egress-only IGW is the IPv6 equivalent
Keywords: capture, IP traffic, information, logs, metadata Answer: VPC Flow Logs Why: Flow Logs = metadata (IPs, ports, action). Traffic Mirroring = full packet capture
Keywords: inspect, deep packet, content, security appliance Answer: VPC Traffic Mirroring Why: Copies actual packets to ENI/NLB for analysis
Keywords: 500 Mbps, DX, connection Answer: Hosted connection Why: Dedicated = 1/10/100 Gbps only. Anything in between = Hosted
Keywords: sophisticated, entire VPC, Layer 3-7, all directions Answer: AWS Network Firewall Why: NACLs/SGs are basic, WAF is HTTP-only. Network Firewall covers L3-L7 in all directions
Keywords: Direct Connect, multiple regions, VPCs Answer: Direct Connect Gateway Why: One DX → DX Gateway → VPCs across regions
On-Premises Connectivity Comparison:
| Feature | Site-to-Site VPN | Direct Connect | Client VPN |
|---|---|---|---|
| Speed | Up to 1.25 Gbps (VGW) or more (TGW+ECMP) | 50 Mbps – 100 Gbps | N/A |
| Path | Public internet | Private physical line | Public internet |
| Encrypted | ✅ Yes (IPsec) | ❌ No (add VPN on top) | ✅ Yes (OpenVPN) |
| Setup time | Minutes/hours | >1 month | Minutes |
| Cost | Low | High | Low |
| Use case | Quick setup, backup for DX | Large bandwidth, consistent | Individual users |
| AWS side | VGW or TGW | VGW + DX Location | Client VPN Endpoint |
| On-prem side | CGW | Customer Router | OpenVPN client |
VPC Endpoint Comparison:
| Feature | Gateway Endpoint | Interface Endpoint |
|---|---|---|
| Services | S3, DynamoDB | Everything else |
| Cost | Free | $/hr + $/GB |
| How | Route Table entry | ENI (private IP) |
| SG | No | Yes |
| On-prem access | ❌ | ✅ |
Security Layers:
| Layer | Tool | Scope | Rules |
|---|---|---|---|
| Instance/ENI | Security Group | Per ENI | Allow only, stateful |
| Subnet | NACL | Per subnet | Allow + Deny, stateless |
| HTTP/HTTPS | WAF | CloudFront/ALB/API GW | Web ACL rules |
| Entire VPC (L3-L7) | Network Firewall | Per VPC | Allow/Drop/Alert |
| Cross-account | Firewall Manager | Organization | Centralized management |
Key Numbers:
| What | Value |
|---|---|
| Max VPCs per region | 5 (soft limit) |
| Max CIDRs per VPC | 5 |
| VPC CIDR range | /28 (16 IPs) – /16 (65,536 IPs) |
| Reserved IPs per subnet | 5 |
| VPN throughput (VGW) | 1.25 Gbps |
| VPN throughput (TGW, 1 conn) | 2.5 Gbps (ECMP) |
| NAT Gateway bandwidth | Up to 100 Gbps |
| DX Dedicated speeds | 1 / 10 / 100 Gbps |
| DX Hosted speeds | 50 Mbps – 10 Gbps |
| DX setup time | >1 month |
| Ephemeral ports (Linux) | 32768 – 60999 |
| Ephemeral ports (Windows) | 49152 – 65535 |
| Question Contains | → Instant Answer |
|---|---|
| “Private subnet internet IPv4, managed” | NAT Gateway |
| “NAT + Security Groups” | NAT Instance |
| “NAT + Bastion Host” | NAT Instance |
| “SSH into private EC2” | Bastion Host (or SSM) |
| “Bastion Host SG, which port/CIDR?” | Port 22, company public CIDR |
| “Default NACL behavior” | Allow ALL traffic |
| “Custom NACL behavior” | Deny ALL traffic |
| “ACCEPT then REJECT in flow logs” | NACL blocking (not SG) |
| “Return traffic blocked” | NACL (stateless) |
| “Ephemeral ports” | NACL outbound/inbound rules needed |
| “Top-10 IP addresses in flow logs” | CloudWatch Contributor Insights |
| “Analyze flow logs with SQL” | S3 + Athena |
| “VPC Peering transitive?” | NO — need TGW |
| “Route tables updated one side only” | Update BOTH VPCs |
| “S3/DynamoDB private access from VPC” | Gateway Endpoint (free) |
| “AWS service access from on-prem” | Interface Endpoint |
| “Lambda can’t reach DynamoDB” | VPC Gateway Endpoint |
| “Expose service cross-VPC privately” | PrivateLink (NLB + ENI) |
| “Ping EC2 from on-prem fails” | ICMP in SG + Route Propagation |
| “Multiple on-prem sites, VPN backup” | VPN CloudHub |
| “Private, dedicated, consistent connection” | Direct Connect |
| “Encrypted connection over internet” | Site-to-Site VPN |
| “Improve connection in days/1 week” | VPN (NOT DX — >1 month) |
| “DX backup, cost-effective” | Site-to-Site VPN |
| “500 Mbps DX connection” | Hosted (Dedicated = 1/10/100 only) |
| “DX to multiple regions” | DX Gateway |
| “Share DX across accounts” | TGW + DX GW + Transit VIF + RAM |
| “VPN bandwidth >1.25 Gbps” | TGW + ECMP |
| “Many VPCs + on-prem, simplify” | Transit Gateway |
| “IP Multicast” | Transit Gateway |
| “IPv6 outbound, block inbound” | Egress-only IGW |
| “Can’t launch EC2 in subnet” | IPv4 exhausted → new CIDR |
| “Reduce S3 access cost from VPC” | Gateway Endpoint (free) |
| “Capture IP traffic metadata” | VPC Flow Logs |
| “Deep packet inspection” | VPC Traffic Mirroring |
| “VPC-wide L3-L7 protection” | AWS Network Firewall |
| “Centralize firewall rules cross-account” | AWS Firewall Manager |
| “ALB → EC2 SG, most secure” | Reference ALB’s SG (not CIDR) |
□ Is it on-premises → AWS?
→ Yes: VPN, DX, or Client VPN
□ Need it fast (days)?
→ Yes = Site-to-Site VPN
→ No (can wait months) = Direct Connect
□ Individual user (laptop)?
→ Yes = Client VPN
□ Multiple on-prem sites?
→ Yes = VPN CloudHub
→ No: VPC-to-VPC or VPC-to-service
□ Is it VPC → VPC?
→ 2 VPCs = VPC Peering
→ Many VPCs = Transit Gateway
→ Expose single service = PrivateLink
□ Is it VPC → AWS Service?
→ S3 or DynamoDB = Gateway Endpoint
→ Anything else = Interface Endpoint
→ Needs on-prem access = Interface Endpoint□ What layer?
→ L3-L4 per instance = Security Group
→ L3-L4 per subnet = NACL
→ L7 HTTP only = WAF
→ L3-L7 entire VPC = Network Firewall
□ Need DENY rules?
→ Yes = NACL (SGs only have ALLOW)
□ Stateful or stateless matters?
→ "Return traffic blocked" = NACL (stateless)
→ "Ephemeral ports needed" = NACL□ Private IP or Public IP?
→ Private = cheaper (free same-AZ, $0.01 cross-AZ)
→ Public = $0.02 even same-AZ
□ S3 access path?
→ NAT GW → IGW = expensive
→ Gateway Endpoint = free
□ Content delivery?
→ S3 direct = $0.09/GB
→ CloudFront = $0.085/GB + caching┌──────────┐ example.com? ┌─────────────┐
│ Client │ ───────────────→ │ Route 53 │
│ │ ←─────────────── │ │
└────┬─────┘ 54.22.33.44 └─────────────┘
│
│ 54.22.33.44
▼
┌─────────────────────────────────────┐
│ AWS Cloud │
│ ┌──────────────────────┐ │
│ │ EC2 Instance │ │
│ │ Public IP: │ │
│ │ 54.22.33.44 │ │
│ └──────────────────────┘ │
└─────────────────────────────────────┘AWS Route53 is a managed DNS (Domain Name System), collection of rules and records which helps clients understand how to reach a server through URLs.
| Feature | Details |
|---|---|
| Type | Highly available, scalable, fully managed Authoritative DNS |
| Authoritative | You (customer) can update DNS records |
| Domain Registrar | Yes — can register domains directly |
| Health Checks | Monitor health of your resources |
| SLA | 100% availability (only AWS service with this!) |
| Scope | Global service (not regional) |
| Why “53”? | Traditional DNS port number |
⚠️ Exam trap: Route 53 is a global service — no region selection needed!
DNS Terminologies:
http://api.www.example.com.
│ │ │ │ │
│ │ │ │ └── Root (.)
│ │ │ └──── TLD (.com, .gov, .org)
│ │ └────────── SLD (example.com)
│ └────────────────── Sub Domain (www)
└────────────────────── Sub Domain (api)
└────────────────────────────┘
FQDN (Fully Qualified Domain Name)| Term | Description |
|---|---|
| Domain Registrar | Amazon Route 53, GoDaddy, etc. |
| DNS Records | A, AAAA, CNAME, NS, etc. |
| Zone File | Contains DNS records |
| Name Server | Resolves DNS queries (Authoritative or Non-Authoritative) |
| TLD | .com, .us, .gov, .org |
| SLD | amazon.com, google.com |
DNS Resolution Flow:
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Browser │───→│ OS │───→│ ISP │───→│ Root │───→│ TLD │───→│ Name │
│ Cache │ │ Cache │ │ Cache │ │ Server │ │ Server │ │ Server │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └────┬────┘
│
┌─────────────────────────────────┘
▼
IP Address
(cached on way back)Route 53 – Records:
Each record contains:
| Field | Description |
|---|---|
| Domain/subdomain Name | e.g., example.com |
| Record Type | A, AAAA, CNAME, NS, etc. |
| Value | e.g., 12.34.56.78 |
| Routing Policy | How Route 53 responds to queries |
| TTL | Time record is cached at DNS Resolvers |
| Record Type | Must Know | Description |
|---|---|---|
| A | ✅ | Maps hostname to IPv4 |
| AAAA | ✅ | Maps hostname to IPv6 |
| CNAME | ✅ | Maps hostname → another hostname (target must have A/AAAA) |
| NS | ✅ | Name Servers for the Hosted Zone — controls traffic routing |
| CAA, DS, MX, PTR, SOA, TXT, SPF, SRV | Advanced | Less common record types |
⚠️ Exam trap: CNAME cannot be used for Zone Apex (example.com) — only for subdomains (www.example.com). Use Alias for apex!
Route 53 – Hosted Zones:
A container for records that define how to route traffic to a domain and its subdomains.
| Type | Access | Example | Use Case |
|---|---|---|---|
| Public | Internet | example.com → 54.22.33.44 | S3, CloudFront, EC2 (Public IP), ALB |
| Private | Within VPC(s) | api.example.internal → 10.0.0.10 | Internal EC2, RDS, microservices |
PUBLIC HOSTED ZONE PRIVATE HOSTED ZONE
────────────────── ───────────────────
┌────────┐ example.com? ┌─────────────────────────────────┐
│ Client │ ──────────────┐ │ VPC │
└────────┘ │ │ ┌─────────────────────────┐ │
▲ ▼ │ │ Private Hosted Zone │ │
│ ┌────────────┐ │ └───────────┬─────────────┘ │
│ │ Public │ │ │ │
└────────────│ Hosted │ │ ┌──────────┴──────────┐ │
54.22.33.44 │ Zone │ │ ▼ ▼ │
└─────┬──────┘ │ api.example db.example │
│ │ .internal? .internal? │
▼ │ │ │ │
┌───────────────┐ │ ▼ ▼ │
│ S3, CloudFront│ │ 10.0.0.10 10.0.0.35 │
│ EC2, ALB │ │ (EC2) (RDS) │
└───────────────┘ └─────────────────────────────────┘Route 53 – TTL (Time To Live):
myapp.example.com?
┌────────┐ ─────────────────────→ ┌──────────┐
│ Client │ ←───────────────────── │ Route 53 │
└───┬────┘ └──────────┘
│ A 12.34.56.78 (TTL)
│
│ Client caches result for TTL duration
│
│ HTTP Request
└──────────────────────────→ ┌────────────┐
←─────────────────────────── │ Web Server │
HTTP Response └────────────┘| TTL | Traffic to Route 53 | Record Freshness | Cost | Use Case |
|---|---|---|---|---|
| High (24 hr) | Less | Possibly outdated | Lower | Stable records |
| Low (60 sec) | More | Always fresh | Higher $$ | Before migrations/changes |
⚠️ Exam trap: Changed DNS record but users still go to old IP? → TTL caching! Clients cache until TTL expires.
Route 53 – CNAME vs Alias:
AWS resources expose ugly hostnames (e.g., lb1-1234.us-east-2.elb.amazonaws.com) — you want myapp.mydomain.com
| Feature | CNAME | Alias |
|---|---|---|
| Points to | Any hostname | AWS resources only |
| Zone Apex (root domain) | ❌ NO | ✅ YES |
| Cost | Standard DNS charges | Free |
| Health Check | ❌ | ✅ Native |
| TTL | You set it | Auto-managed by Route 53 |
| Record Type | CNAME | A or AAAA |
⚠️ Exam trap: Need to point mydomain.com (root) to an ALB? → Alias (CNAME won’t work!)
Route 53 – Alias Records:
AWS extension to DNS that maps a hostname to an AWS resource. Automatically recognizes IP changes on the target.
┌───────────────────────────────────────────────┐
│ Route 53 Alias Record │
│ ┌────────────────────────────────────────┐ │
│ │ Record: example.com │ │
│ │ Type: A │ │
│ │ Value: MyALB-123456789.us-east-1... │ │
│ └────────────────────────────────────────┘ │
└───────────┬───────────────────────────────────┘
│ AWS-Managed (IP changes tracked)
▼
┌──────────────────┐
│ Application │
│ Load Balancer │
│ (MyALB-1234...) │
└──────────────────┘| Characteristic | Detail |
|---|---|
| Works at Zone Apex | ✅ YES (example.com) |
| Cost | Free (unlike CNAME) |
| Health Checks | ✅ Native support |
| TTL | ❌ Not settable (auto-managed) |
| Record Type | A or AAAA only |
| Auto IP tracking | ✅ YES (AWS manages) |
⚠️ Exam trap: Alias records — you cannot set TTL (Route 53 manages it automatically)
Targets:
⚠️ Exam trap: Cannot use Alias for EC2 DNS names — use regular A record or CNAME instead!
Route 53 – Routing Policies:
DNS responds to queries (does NOT route traffic like a load balancer).
Route53 policies:
| Policy | Use Case | Key Feature |
|---|---|---|
| Simple | Single resource | Randomly chosen if multiple values |
| Weighted | Load balancing | Control traffic % distribution |
| Failover | Active-passive HA | Primary + standby resource |
| Latency | Multi-region | Routes to lowest latency region |
| Geolocation | Location-based | Route by user geography |
| Geoproximity | Resource location bias | Route by resource location + bias |
| Multivalue | Multiple IPs | Up to 8 random healthy records |
| IP-based | Client IP routing | Route by CIDR blocks |
⚠️ Exam trap: “Routing” in Route 53 ≠ Load Balancer routing. DNS responds to queries; it doesn’t route actual traffic!
Routing Policy – Simple:
SINGLE VALUE MULTIPLE VALUES
───────────── ──────────────────
foo.example.com foo.example.com
│ │
│ A 11.22.33.44 │ A 11.22.33.44
┌─────────────────┐ ┌─────────────────────┐
│ Client │ │ Client chooses │
│ Gets 1 value │ │ a random value │
└─────────────────┘ └─────────────────────┘
│ A 55.66.77.88
│ A 99.11.22.33⚠️ Exam trap: Simple policy with multiple values ≠ load balancing! No health checks, no failover.
Routing Policy – Weighted:
traffic % = weight for record / sum of all weights⚠️ Exam trap: Weighted ≠ round-robin! It’s percentage-based distribution, not sequential rotation. ⚠️ Exam trap: Weight = 0 stops all traffic to that resource (useful for maintenance)
Routing Policy – Latency-based:
⚠️ Exam trap: Latency ≠ Geography! German user may be directed to US if that has lowest latency. ⚠️ Exam trap: “Best user experience” / “minimize response time” → Latency, not Geolocation!
Route 53 – Health Checks:
Multi-region failover architecture:
┌───────────┐
│ Route 53 │
│ DNS Record│
└─────┬─────┘
│
┌─────────────┴─────────────┐
▼ ▼
❤ Health Check ❤ Health Check
│ │
┌────────┴────────┐ ┌────────┴────────┐
│ us-east-1 │ │ eu-west-1 │
│ ┌───────────┐ │ │ ┌───────────┐ │
│ │ ALB │ │ │ │ ALB │ │
│ └─────┬─────┘ │ │ └─────┬─────┘ │
│ ▼ │ │ ▼ │
│ ┌───────────┐ │ │ ┌───────────┐ │
│ │ ASG │ │ │ │ ASG │ │
│ └───────────┘ │ │ └───────────┘ │
└─────────────────┘ └─────────────────┘How endpoint monitoring works:
❤ Health Checker ❤ Health Checker ❤ Health Checker
(us-east-1) (us-west-1) (sa-east-1)
│ │ │
└──────────────────┼──────────────────┘
│ HTTP request to /health
▼ 200 code
┌─────────────────┐
│ eu-west-1 │
│ ┌───────────┐ │
│ │ ALB │──┼── Must allow Route 53
│ └─────┬─────┘ │ Health Checker IPs!
│ ▼ │
│ ┌───────────┐ │
│ │ EC2/ASG │ │
│ └───────────┘ │
└─────────────────┘IP ranges: https://ip-ranges.amazonaws.com/ip-ranges.json
| Setting | Value |
|---|---|
| Global health checkers | ~15 |
| Threshold (healthy/unhealthy) | 3 (default) |
| Interval | 30 sec (10 sec = higher cost) |
| Protocols | HTTP, HTTPS, TCP |
| Healthy if | >18% checkers report healthy |
| Pass codes | 2xx and 3xx only |
| Text match | First 5120 bytes of response |
⚠️ Exam trap: Must configure firewall/security group to allow Route 53 Health Checker IPs!
| Health Check Type | What It Monitors | Use Case |
|---|---|---|
| Endpoint | Application, server, AWS resource | Direct resource monitoring |
| Calculated | Other health checks | Aggregate multiple checks |
| CloudWatch Alarm | CW Alarms (DynamoDB throttles, RDS, custom) | Private resources |
⚠️ Exam trap: Only 3 health check types! No direct SQS, SNS, or other service monitoring — use CloudWatch Alarm instead.
⚠️ Exam trap: Private resources → use CloudWatch Alarm health checks (HTTP checks can’t reach them!)
Health Checks for Private Resources:
┌─────────────────────────────────┐
│ VPC │
┌─────────────────┐ │ ┌───────────────────────────┐ │
│ Health Checker │ │ │ Private subnet │ │
│ (us-east-1) │ │ │ ┌─────────────┐ │ │
└────────┬────────┘ │ │ │ EC2 (T2) │ │ │
│ │ │ └──────┬──────┘ │ │
│ ✖ Can't reach! │ │ │ monitor │ │
│ │ │ ▼ │ │
│ monitor │ │ ┌─────────────┐ │ │
└────────────────────┼──┼───→│ CloudWatch │ │ │
│ │ │ Alarm │ │ │
│ │ └─────────────┘ │ │
│ └───────────────────────────┘ │
└─────────────────────────────────┘Routing Policy – Geolocation:
⚠️ Exam trap: Geolocation ≠ Latency! Geolocation = user’s geography; Latency = network performance. ⚠️ Exam trap: “Legal requirement” / “restrict access by country” → Geolocation (not Latency!)
Routing Policy – Geoproximity:
⚠️ Exam trap: Geoproximity requires Traffic Flow (paid feature). Geolocation does NOT!
Routing Policy – IP-based:
User B User A
(200.5.4.100) (203.0.113.56)
│ │
└────────┬─────────┘
▼
┌─────────┐
│Route 53 │
└────┬────┘
│
┌─────────┴─────────┐
│ CIDR Collection │
├───────────────────┤
│ location-1: 203.0.113.0/24 │
│ location-2: 200.5.4.0/24 │
└─────────┬─────────┘
│
┌─────────┴─────────┐
│ Records │
├───────────────────┤
│ example.com → 1.2.3.4 (location-1) │
│ example.com → 5.6.7.8 (location-2) │
└─────────┬─────────┘
│
┌───────┴───────┐
▼ ▼
EC2 (5.6.7.8) EC2 (1.2.3.4)
User B → User A →Routing Policy – Multi-Value:
⚠️ Exam trap: Multi-Value is NOT a substitute for ELB! It’s client-side selection, not load balancing.
Domain Registrar vs. DNS Service:
| Concept | Description |
|---|---|
| Domain Registrar | Where you buy/register domain (GoDaddy, Amazon Registrar, etc.) — annual fee |
| DNS Service | Where you manage DNS records (can be different from registrar!) |
⚠️ Exam trap: Update NS records at the registrar (GoDaddy), not in Route 53! And use Public Hosted Zone for internet-facing domains.
Route 53 – Hybrid DNS:
Route 53 Resolver automatically answers DNS queries for:
Hybrid DNS = Resolving DNS queries between VPC (Route 53 Resolver) and your networks (other DNS Resolvers)
| Network Type | Connection |
|---|---|
| VPC / Peered VPC | Native |
| On-premises | Direct Connect or AWS VPN |
Route 53 – Resolver Endpoints:
Inbound Endpoint — On-premises DNS resolvers can query Route 53 Resolver for AWS resources
┌─────────────────────────────────────────┐
│ us-east-1 │
On-Premises Data Center │ ┌───────────────────────────────────┐ │
┌──────────────────────┐ │ │ VPC │ │
│ │ │ │ Private Hosted Zone │ │
│ ┌────────────────┐ │ │ │ (aws.private) │ │
│ │ DNS Resolvers │ │ │ │ ┌─────────────────────────────┐ │ │
│ │(onpremise. │ │ │ │ │ Private Subnet │ │ │
│ │ private) │──┼── DNS Query: app.aws.private? ──────────────→│ │ │
│ └────────────────┘ │ │ │ │ ┌────────────┐ ┌────────┐ │ │ │
│ ▲ │ │ │ │ │ EC2 │ │Resolver│ │ │ │
│ │ │ │ │ │ │(app.aws. │←─│Inbound │ │ │ │
│ ┌──────┴───────┐ │ │ │ │ │ private) │ │Endpoint│ │ │ │
│ │ Server │ │ │ │ │ └────────────┘ └───┬────┘ │ │ │
│ │ (web.onprem │ │◀═══VPN or DX═══════════════════════════╝ │ │ │
│ │ .private) │ │ │ │ └─────────────────────────────┘ │ │
│ └──────────────┘ │ │ └──────────────┬────────────────────┘ │
└──────────────────────┘ │ │ lookup │
│ ▼ │
│ Route 53 Resolver │
└─────────────────────────────────────────┘| Endpoint | Direction | Use Case |
|---|---|---|
| Inbound | On-prem → AWS | On-prem resolves AWS Private Hosted Zone records |
| Outbound | AWS → On-prem | AWS resources resolve on-premises DNS records |
⚠️ Exam trap: Inbound = queries coming IN to AWS. Outbound = queries going OUT from AWS. Think from AWS perspective!
Outbound Endpoint — Route 53 Resolver forwards DNS queries to on-premises DNS Resolvers
┌─────────────────────────────────────────┐
│ us-east-1 │
On-Premises Data Center │ ┌───────────────────────────────────┐ │
┌──────────────────────┐ │ │ VPC │ │
│ │ │ │ Private Hosted Zone │ │
│ ┌────────────────┐ │ │ │ (aws.private) │ │
│ │ DNS Resolvers │ │ │ │ ┌─────────────────────────────┐ │ │
│ │(onpremise. │←─┼── DNS Query: web.onpremise.private? ────────│ │ │
│ │ private) │ │ │ │ │ ┌────────────┐ ┌────────┐ │ │ │
│ └────────────────┘ │ │ │ │ │ EC2 │─→│Resolver│ │ │ │
│ │ │ │ │ │ │(app.aws. │ │Outbound│ │ │ │
│ ▼ │ │ │ │ │ private) │ │Endpoint│─┼──┼──┘
│ ┌──────────────┐ │ │ │ │ └────────────┘ └────────┘ │ │
│ │ Server │ │◀═══VPN or DX═════════════════════════════════╝ │
│ │ (web.onprem │ │ │ │ └─────────────────────────────┘ │
│ │ .private) │ │ │ └──────────────┬────────────────────┘
│ └──────────────┘ │ │ │ │
└──────────────────────┘ │ ▼ │
│ Route 53 Resolver │
└──────────────────────────────────────┘Resolver Rules = define how DNS queries are forwarded from Outbound Endpoints
| Rule Type | Description |
|---|---|
| Conditional Forwarding | Forward queries for specific domains to target DNS servers |
| System | Default rules (auto-created for Private Hosted Zones, VPC DNS) |
| Recursive | Forward all unmatched queries to Route 53 Resolver |
Resolver Rules Example:
Query: db.corp.local Query: api.example.com
│ │
▼ ▼
┌──────────────────────────────────────┐
│ Resolver Rules │
│ ┌────────────────────────────────┐ │
│ │ *.corp.local → 10.0.0.53 │──┼──▶ On-prem DNS
│ │ *.example.com → System Rule │──┼──▶ Route 53
│ │ * (default) → Recursive │──┼──▶ Public DNS
│ └────────────────────────────────┘ │
└──────────────────────────────────────┘⚠️ Exam trap: “Share DNS resolution across accounts” → Resolver Rules + AWS RAM
DNSSEC = DNS Security Extensions — protects against DNS spoofing/cache poisoning
| Feature | Details |
|---|---|
| Purpose | Cryptographically sign DNS records to verify authenticity |
| Route 53 support | ✅ DNSSEC signing for public hosted zones |
| How it works | Uses KMS to manage keys (KSK), Route 53 manages ZSK |
| Chain of trust | Root → TLD → Your domain (DS records link them) |
Setup steps:
⚠️ Exam trap: “Prevent DNS spoofing” or “verify DNS response authenticity” → DNSSEC ⚠️ Exam trap: DNSSEC KMS key must be in us-east-1 (like CloudFront certificates)
Route 53 responds to DNS queries — it returns IP addresses. It does NOT route actual network traffic.
CNAME has limitations (can’t use at Zone Apex, costs money). AWS invented Alias to solve this:
Rule: If pointing to AWS resource → use Alias. If pointing to non-AWS → use CNAME (or A record).
example.com (no subdomain) = Zone Apex = Root Domain
| Record Type | Zone Apex? | Example |
|---|---|---|
| CNAME | ❌ NO | Cannot use for example.com |
| Alias | ✅ YES | Can use for example.com |
| A Record | ✅ YES | Can use for example.com |
DNS standard forbids CNAME at apex. AWS Alias bypasses this limitation.
Health checks are the foundation of high availability in Route 53:
TTL determines how long clients cache DNS responses:
Two commonly confused policies:
German user might be routed to US-East if that has lower latency than EU-West.
When another AWS service or another account needs access:
Think from AWS’s perspective:
What are you pointing to?
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
IPv4 Address AWS Resource Another Hostname
│ │ │
▼ ▼ ▼
A Record Alias A/AAAA Is it Zone Apex?
(FREE!) │
┌───────┴───────┐
▼ ▼
Yes No
│ │
▼ ▼
Alias CNAME ok
(required) What's the requirement?
│
┌────────────┬────────────┼────────────┬────────────┬────────────┐
▼ ▼ ▼ ▼ ▼ ▼
Single Traffic % Best User User's Failover Client IP
Resource Control Experience Country HA Setup Routing
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
Simple Weighted Latency Geolocation Failover IP-based| If question mentions… | Answer is… |
|---|---|
| “Zone Apex” / “root domain” + AWS resource | Alias record |
| “example.com” (not www.) + ALB/CloudFront | Alias record |
| “free DNS queries” | Alias record |
| “minimize response time” / “best user experience” | Latency routing |
| “legal requirement” / “restrict by country” | Geolocation routing |
| “content localization by region” | Geolocation routing |
| “A/B testing” / “canary deployment” | Weighted routing |
| “traffic percentage” / “gradual migration” | Weighted routing |
| “active-passive” / “disaster recovery” | Failover routing |
| “primary and secondary” | Failover routing |
| “shift traffic between locations” / “bias” | Geoproximity routing |
| “on-premises resolves AWS domains” | Inbound Resolver Endpoint |
| “AWS resolves on-premises domains” | Outbound Resolver Endpoint |
| “DNS spoofing” / “verify authenticity” | DNSSEC |
| “private resource health check” | CloudWatch Alarm health check |
| “share DNS rules across accounts” | Resolver Rules + AWS RAM |
| “users still see old IP after change” | TTL caching issue |
| Statement | Why It’s Wrong |
|---|---|
| CNAME for Zone Apex | CNAME cannot be used at root domain |
| Alias for EC2 DNS name | Alias doesn’t support EC2 DNS — use A/CNAME |
| Alias with custom TTL | Alias TTL is auto-managed, cannot be set |
| Health check for private EC2 | Health checkers can’t reach private subnets |
| Simple policy with health check | Simple routing doesn’t support health checks |
| Geolocation for best performance | Geolocation = geography, not network performance |
| Multi-Value replaces ELB | Multi-Value is DNS-level, not true load balancing |
| Route 53 routes traffic | Route 53 answers DNS queries, doesn’t route traffic |
| Cannot… | Instead… |
|---|---|
| Use CNAME at Zone Apex | Use Alias |
| Set TTL on Alias records | TTL is auto-managed |
| Create Alias to EC2 DNS name | Use A record or CNAME |
| Health check private resources directly | Use CloudWatch Alarm |
| Use Geoproximity without Traffic Flow | Traffic Flow is required |
| Have health checks with Simple policy | Use Weighted/Failover/Multi-Value |
Keywords: example.com (no www), Zone Apex, ALB, CloudFront, root domain
Answer: Alias record (A type)
Why: CNAME cannot be used at Zone Apex. Alias can.
Keywords: lowest latency, best performance, fastest response, multi-region app
Answer: Latency-based routing
Why: Routes to AWS region with best network performance, regardless of geography.
Keywords: country restrictions, content localization, legal requirement, GDPR
Answer: Geolocation routing
Why: Routes based on user’s physical location, not network performance.
Keywords: A/B testing, percentage of traffic, gradual rollout, 10% to new version
Answer: Weighted routing
Why: Control exact percentage of traffic to each resource.
Keywords: primary and secondary, failover, standby, DR site
Answer: Failover routing policy + Health checks
Why: Automatically switches to secondary when primary fails health check.
Keywords: hybrid cloud, on-premises DNS, resolve Private Hosted Zone from datacenter
Answer: Inbound Resolver Endpoint
Why: Allows on-prem DNS servers to query Route 53 for AWS resources.
Keywords: EC2 needs to reach on-prem by hostname, resolve corp.local from VPC
Answer: Outbound Resolver Endpoint + Forwarding Rules
Why: Forwards DNS queries from VPC to on-premises DNS servers.
Keywords: DNS not updating, old IP, change not propagating
Answer: TTL caching issue
Solution: Wait for TTL to expire, or lower TTL before making changes.
Keywords: private subnet, internal EC2, RDS health, can’t reach from internet
Answer: CloudWatch Alarm-based health check
Why: Route 53 health checkers are public — can’t reach private resources.
Keywords: DNS security, cache poisoning, MITM, verify DNS response
Answer: DNSSEC
Remember: KMS key must be in us-east-1.
Keywords: multi-account, centralized DNS, share resolver rules
Answer: Resolver Rules + AWS RAM
Why: Resolver Rules can be shared across accounts via Resource Access Manager.
Keywords: GoDaddy, third-party registrar, use Route 53
Answer: Create Public Hosted Zone → Update NS records at the registrar
Why: NS records tell the internet where to find your DNS. Update at registrar, not Route 53.
| Policy | Health Check? | Use Case | Key Feature |
|---|---|---|---|
| Simple | ❌ No | Single resource | Returns all values, client picks |
| Weighted | ✅ Yes | A/B testing, migration | Traffic % control |
| Failover | ✅ Yes (required) | DR, active-passive | Primary + secondary |
| Latency | ✅ Yes | Multi-region apps | Best network performance |
| Geolocation | ✅ Yes | Country restrictions | User’s physical location |
| Geoproximity | ✅ Yes | Shift traffic by location | Bias values (-99 to +99) |
| Multi-Value | ✅ Yes | Multiple healthy IPs | Up to 8 healthy records |
| IP-based | ✅ Yes | Route by client CIDR | Client IP → location mapping |
| Record | Maps To | Zone Apex? | AWS Extension? |
|---|---|---|---|
| A | IPv4 | ✅ Yes | No |
| AAAA | IPv6 | ✅ Yes | No |
| CNAME | Hostname | ❌ No | No |
| Alias | AWS Resource | ✅ Yes | ✅ Yes (AWS-only) |
| NS | Name Servers | ✅ Yes | No |
| ✅ Can Alias To | ❌ Cannot Alias To |
|---|---|
| ALB, NLB, Classic LB | EC2 DNS name |
| CloudFront Distribution | Non-AWS resources |
| API Gateway | RDS endpoint |
| Elastic Beanstalk | Other CNAMEs |
| S3 Website Endpoint | |
| VPC Interface Endpoint | |
| Global Accelerator | |
| Another Route 53 record |
| Type | Monitors | Use Case |
|---|---|---|
| Endpoint | HTTP/HTTPS/TCP to public IP | Public resources |
| Calculated | Other health checks (AND/OR) | Aggregate multiple checks |
| CloudWatch Alarm | CloudWatch metric state | Private resources |
| Item | Value |
|---|---|
| Hosted Zone cost | $0.50/month |
| Health check interval | 30 sec (10 sec = extra cost) |
| Health checkers globally | ~15 |
| Healthy threshold | 3 consecutive |
| % checkers for healthy | >18% |
| Multi-Value max records | 8 |
| Weighted max value | Any number (relative) |
| Geoproximity bias range | -99 to +99 |
| TTL recommendation before changes | Low (60 sec) |
| Question Contains | → Instant Answer |
|---|---|
| “Zone Apex” / “root domain” + AWS | Alias record |
| “example.com to ALB” | Alias record |
| “free DNS queries to AWS” | Alias record |
| “CNAME at root” | ❌ Not possible → use Alias |
| “lowest latency” / “best performance” | Latency routing |
| “country restriction” / “legal” | Geolocation routing |
| “localization by region” | Geolocation routing |
| “A/B test” / “canary” | Weighted routing |
| “percentage of traffic” | Weighted routing |
| “active-passive” / “DR” | Failover routing |
| “primary/secondary” | Failover routing |
| “shift traffic” / “bias” | Geoproximity routing |
| “private resource health” | CloudWatch Alarm health check |
| “on-prem → AWS DNS” | Inbound Resolver Endpoint |
| “AWS → on-prem DNS” | Outbound Resolver Endpoint |
| “DNS spoofing” / “DNSSEC” | DNSSEC (KMS key in us-east-1) |
| “share DNS across accounts” | Resolver Rules + AWS RAM |
| “old IP still showing” | TTL caching |
| “GoDaddy + Route 53” | Update NS at registrar |
| “100% availability SLA” | Route 53 (only AWS service!) |
When stuck between options, eliminate systematically:
□ Is it Zone Apex (root domain)?
→ Yes = eliminate CNAME, must use Alias or A
→ No = CNAME is acceptable
□ Do they need health checks?
→ Yes = eliminate Simple routing
→ Failover REQUIRES health checks
□ Is it about USER LOCATION?
→ Physical location = Geolocation
→ Network performance = Latency
□ Is the resource PRIVATE?
→ Yes = eliminate direct HTTP health check
→ Use CloudWatch Alarm instead
□ Is it pointing to AWS resource?
→ Yes = prefer Alias (free, auto-tracking)
→ No = use CNAME or A record
□ Do they need traffic PERCENTAGE control?
→ Yes = Weighted routing
→ Just failover = Failover routing
□ Is it HYBRID (on-prem + AWS)?
→ On-prem queries AWS = Inbound Endpoint
→ AWS queries on-prem = Outbound Endpoint
□ Is it about DNS SECURITY?
→ Spoofing/authenticity = DNSSEC
→ KMS key must be in us-east-1Understanding which services are global vs regional is critical for:
| Service | Why Global | Key Implication |
|---|---|---|
| IAM | Identity is account-wide | Users, roles, policies work everywhere |
| Route 53 | DNS is global | Hosted zones accessible from any region |
| CloudFront | CDN with edge locations | Certs must be in us-east-1 |
| WAF (for CloudFront) | Attached to global CF | WAF rules in us-east-1 |
| Global Accelerator | Anycast IPs, global routing | Entry point is global |
| AWS Organizations | Multi-account management | SCPs apply across all regions |
| Artifact | Compliance documents | Account-level access |
| Service | Regional Scope | Cross-Region Options |
|---|---|---|
| EC2 | Instances in one region | AMI copy, snapshots |
| S3 | Bucket in one region | Cross-Region Replication (CRR) |
| RDS | DB in one region | Read Replicas, snapshots |
| Lambda | Functions in one region | Deploy to each region |
| API Gateway | API in one region | Edge-Optimized uses CF |
| DynamoDB | Table in one region | Global Tables (multi-region) |
| Aurora | Cluster in one region | Global Database |
| KMS | Keys in one region | Multi-Region Keys (mrk-) |
| Secrets Manager | Secrets in one region | Multi-region replication |
| CloudHSM | HSM in one region | No cross-region option! |
| ELB | Load balancer in one region | Use Global Accelerator for global |
| VPC | Network in one region | VPC Peering, Transit Gateway |
| Scenario | ACM Certificate Region |
|---|---|
| CloudFront distribution | us-east-1 (always) |
| Edge-Optimized API Gateway | us-east-1 (uses CloudFront) |
| Regional API Gateway | Same region as API |
| ALB/NLB | Same region as load balancer |
Memory trick: “Where does TLS terminate?”
| Service | Global? | Gotcha |
|---|---|---|
| S3 bucket names | Globally unique | But bucket lives in ONE region |
| Lambda@Edge | Runs at edge | Must be authored in us-east-1 |
| WAF for ALB | Regional | WAF for CloudFront = global (us-east-1) |
| Need | Solution |
|---|---|
| Global static content | S3 + CloudFront |
| Global API | API Gateway (Edge-Optimized) or Global Accelerator + ALB |
| Global database (NoSQL) | DynamoDB Global Tables |
| Global database (SQL) | Aurora Global Database |
| Global encryption keys | KMS Multi-Region Keys |
| Global secrets | Secrets Manager replication |
| Global fixed IPs | Global Accelerator |
⚠️ Exam trap: “CloudHSM multi-region” → IMPOSSIBLE. CloudHSM is single-region only, no replication.
⚠️ Exam trap: “Same KMS key in two regions” → Possible with Multi-Region Keys (mrk- prefix). Regular keys are regional.
⚠️ Exam trap: “Lambda@Edge in eu-west-1” → Wrong. Lambda@Edge must be created in us-east-1, CloudFront replicates it.
AWS CloudFront is a Content Delivery Network (CDN), improves read performance, content is cached at the edge.
⚠️ Exam trap: CloudFront SSL/TLS certificates must be in us-east-1 (even if origin is in another region)
CloudFront Origins:
| Origin Type | Use Case | Notes |
|---|---|---|
| S3 Bucket | Distribute files, cache at edge | Secured with OAC (Origin Access Control) |
| VPC Origin | Private apps in VPC subnets | ALB / NLB / EC2 — no public exposure needed |
| Custom Origin (HTTP) | Any public HTTP backend | S3 static website, custom servers |
⚠️ Exam trap: “Restrict S3 access to CloudFront only” → OAC + S3 Bucket Policy
CloudFront with VPC Origin (Private Resources):
┌─────────────────────────────────────┐
│ VPC │
│ ┌─────────────────────────────┐ │
Users ──▶ CloudFront ──▶ VPC Origin │ │ Private Subnet │ │
(Edge) │ │ ├─▶ ALB │ │
│ │ ├─▶ NLB │ │
│ │ └─▶ EC2 │ │
│ └─────────────────────────────┘ │
└─────────────────────────────────────┘CloudFront vs S3 Cross-Region Replication:
| Feature | CloudFront | S3 CRR |
|---|---|---|
| Scope | Global edge network | Per-region setup |
| Updates | Cached with TTL | Near real-time |
| Access | Read/Write (upload via CF) | Read-only |
| Best for | Static content, global availability | Dynamic content, low-latency in few regions |
CloudFront Origin Groups (Failover):
┌─────────────────────┐
│ Origin Group │
│ │
Users ──▶ CloudFront ───▶│ Primary: S3 (us-east-1)
│ │ │
│ ▼ (on error) │
│ Secondary: S3 (eu-west-1)
│ │
└─────────────────────┘⚠️ Exam trap: “CloudFront high availability” or “origin failover” → Origin Groups
/*) or specific path (/images/*)Cache Invalidation Flow:
Admin ──▶ Invalidate /images/* ──▶ CloudFront ──▶ Edge Locations
│
┌──────────┴──────────┐
▼ ▼
[Cache] [Cache]
index.html ✓ index.html ✓
/images/ ✗ /images/ ✗
(invalidated) (invalidated)Behaviors = rules that define how CloudFront handles requests for different paths
/*)/api/*, /images/*)| Setting | Options |
|---|---|
| Path Pattern | /api/*, /images/*, *.jpg, etc. |
| Origin | Which origin to route to |
| Cache Policy | TTL, headers/cookies to cache by |
| Viewer Protocol | HTTP only, HTTPS only, Redirect HTTP→HTTPS |
| Allowed Methods | GET/HEAD, GET/HEAD/OPTIONS, ALL |
| Edge Functions | CloudFront Functions, Lambda@Edge |
CloudFront Behaviors Example:
Request Path Behavior Origin
─────────────────────────────────────────────────────
/api/* ──▶ API Behavior ──▶ ALB (no cache)
/images/* ──▶ Images Behavior ──▶ S3 (long TTL)
/static/* ──▶ Static Behavior ──▶ S3 (long TTL)
/* (everything) ──▶ Default Behavior ──▶ ALB (short TTL)⚠️ Exam traps:
| Feature | Signed URL | Signed Cookie |
|---|---|---|
| Access scope | 1 file per URL | Multiple files (entire path) |
| Use case | Individual file download | Video streaming, multi-file access |
| URL change | Yes (unique per file) | No (cookie sent with all requests) |
Signed URL vs S3 Pre-Signed URL:
| Feature | CloudFront Signed URL | S3 Pre-Signed URL |
|---|---|---|
| Access via | CloudFront edge (cached) | Direct to S3 |
| Use when | CloudFront in front of S3 | Direct S3 access needed |
| Features | Caching, filtering by IP/path/date | Simple, S3-only |
⚠️ Exam trap: “Private content via CloudFront” → Signed URL/Cookie
Both run code at edge locations, but different scale/capabilities:
| Feature | CloudFront Functions | Lambda@Edge |
|---|---|---|
| Language | JavaScript only | Node.js, Python |
| Execution time | < 1 ms | Up to 5-10 sec |
| Max memory | 2 MB | 128-3008 MB |
| Scale | Millions req/sec | Thousands req/sec |
| Triggers | Viewer Request/Response only | Viewer + Origin Request/Response |
| Network/File access | ❌ | ✅ |
| Cost | 1/6th of Lambda@Edge | Higher |
CloudFront Request Flow:
CloudFront CloudFront
Functions Functions
│ │
User ──▶ Viewer Request ▼ ──▶ Cache ──▶ Origin Request ──▶ Origin (S3/ALB)
│ │
│ Lambda@Edge Lambda@Edge
│ │ │
Viewer Response ◀───┘ ◀── Origin Response ◀──────┘Use Cases:
| Use Case | Best Choice |
|---|---|
| URL rewrites, header manipulation | CloudFront Functions |
| A/B testing (simple) | CloudFront Functions |
| Authentication (JWT validation) | CloudFront Functions |
| Complex auth (DB lookup) | Lambda@Edge |
| Image resizing | Lambda@Edge |
| Call external APIs | Lambda@Edge |
⚠️ Exam traps:
⚠️ Exam trap: “Block/allow by country” → Geo Restriction
| Price Class | Regions Included | Cost |
|---|---|---|
| All | All regions | Best performance, highest cost |
| 200 | Most regions (excludes South America, Australia/NZ) | Balanced |
| 100 | US, Mexico, Canada, Europe, Israel only | Lowest cost |
⚠️ Exam trap: “Reduce CloudFront costs” → use Price Class 100/200 (fewer edge locations)
Problem: Global users → public internet → many hops → high latency
Without Global Accelerator (Public Internet):
America ───┐
│ ┌───┬───┬───┬───┐
Europe ────┼───▶│hop│hop│hop│hop│───▶ Public ALB (India)
│ └───┴───┴───┴───┘
Australia ─┘ (latency)Solution: Use AWS internal network via Anycast IPs
How it works:
With Global Accelerator:
Users ──▶ Anycast IP ──▶ Edge Location ──▶ AWS Private Network ──▶ ALB/NLB/EC2
(static) (nearest) (fast, optimized)Supported Targets: Elastic IP, EC2, ALB, NLB (public or private)
Features:
| Feature | Details |
|---|---|
| Performance | Intelligent routing, lowest latency, fast regional failover |
| Health Checks | Failover < 1 min for unhealthy endpoints, great for DR |
| Security | Only 2 IPs to whitelist, DDoS protection via AWS Shield |
| Caching | No client cache issues (IPs never change) |
Endpoint Weights & Traffic Dial:
| Feature | CloudFront | Global Accelerator |
|---|---|---|
| Content | Cacheable + dynamic content | TCP/UDP applications |
| Caching | ✅ At edge | ❌ No caching (proxies packets) |
| Use cases | Images, videos, APIs, websites | Gaming (UDP), IoT (MQTT), VoIP |
| Static IPs | ❌ | ✅ 2 Anycast IPs |
| Failover | TTL-based | < 1 min (health checks) |
⚠️ Exam traps:
| Service | Scope | Routing Level | Health Checks | Use Case |
|---|---|---|---|---|
| ELB (ALB/NLB) | Single region | Layer 4/7 | ✅ Targets | Distribute traffic across instances in 1 region |
| Route 53 | Global (DNS) | DNS level | ✅ Endpoints | DNS-based routing (latency, geo, failover) |
| Global Accelerator | Global (network) | Network level | ✅ Endpoints | Fast global routing via AWS backbone |
Scenario-Based Selection:
| Scenario | Answer | Why |
|---|---|---|
| Distribute traffic in 1 region | ELB | Regional load balancing |
| Route users to nearest region via DNS | Route 53 (latency routing) | DNS resolves to closest endpoint |
| Instant failover across regions (<1 min) | Global Accelerator | Network-level, no DNS TTL delay |
| Need static IPs for global app | Global Accelerator | 2 Anycast IPs |
| Non-HTTP (gaming, IoT, VoIP) | Global Accelerator | TCP/UDP support |
| Cost-sensitive global routing | Route 53 | Cheaper, but slower failover (DNS TTL) |
Failover Speed:
Route 53: DNS TTL (30s - 5min+) before clients see change
Global Accel: < 1 minute (health check driven, no DNS caching)⚠️ Exam traps:
⚠️ Exam trap — Blue-green deployment + DNS caching + tight deadline:
⚠️ Exam trap: “Encrypt specific form fields at edge” → Field-Level Encryption
| Scenario | CloudFront | Global Accelerator | Route 53 | ELB |
|---|---|---|---|---|
| Cache static content | ✅ | ❌ | ❌ | ❌ |
| Non-HTTP (gaming, IoT) | ❌ | ✅ | ❌ | NLB only |
| Static IPs | ❌ | ✅ | ❌ | NLB only |
| Fastest failover (<1 min) | ❌ | ✅ | ❌ (TTL) | ❌ |
| DNS-based routing | ❌ | ❌ | ✅ | ❌ |
| Single region balancing | ❌ | ❌ | ❌ | ✅ |
| Edge compute (Lambda) | ✅ | ❌ | ❌ | ❌ |
| Origin failover | ✅ (Origin Groups) | ✅ | ✅ | ❌ |
| WebSocket support | ✅ | ✅ | N/A | ✅ (ALB) |
Decision Tree:
| Question | Yes → |
|---|---|
| Need to cache content at edge? | CloudFront |
| Non-HTTP protocol (UDP, TCP raw)? | Global Accelerator |
| Need static IPs for whitelisting? | Global Accelerator (or NLB) |
| Need <1 min failover globally? | Global Accelerator |
| DNS-level routing (geo, latency)? | Route 53 |
| Load balance within 1 region only? | ELB |
| Run code at edge locations? | CloudFront (Functions/Lambda@Edge) |
Understanding which services are global vs regional is fundamental for certificate placement, data residency, and cross-region patterns.
Always Global (no region selection):
Regional with Multi-Region Options:
Regional Only (no cross-region):
Derive: “Where does TLS terminate?” = where certificate must be
Two different problems, two different solutions:
CloudFront caches. Global Accelerator proxies (no caching).
Both services use AWS’s 400+ edge locations worldwide:
Edge = closer to users = lower latency.
Global Accelerator gives you 2 static Anycast IPs → users connect to same IPs worldwide, routed to nearest edge.
CloudFront caches based on TTL (Time To Live):
Origin updates don’t propagate until TTL expires (or you invalidate).
CloudFront Behaviors let you:
/api/* → ALB, /images/* → S3)More specific path patterns take precedence.
OAC ensures only CloudFront can access your S3 bucket:
OAI (Origin Access Identity) is legacy → use OAC.
Two options for running code at edge:
Simple = CloudFront Functions. Complex = Lambda@Edge.
Different services, different failover speeds:
Need instant failover? → Global Accelerator.
Which service?
│
├─► IAM, Route 53, CloudFront, Global Accelerator
│ └─► GLOBAL (no region, but CloudFront certs in us-east-1)
│
├─► DynamoDB, Aurora, KMS, Secrets Manager
│ └─► REGIONAL but has MULTI-REGION options
│
├─► CloudHSM
│ └─► REGIONAL ONLY (no cross-region!)
│
└─► EC2, ELB, VPC, Lambda, API Gateway
└─► REGIONAL (deploy per region) What protocol?
│
┌─────────────┴─────────────┐
▼ ▼
HTTP/HTTPS TCP/UDP (non-HTTP)
│ │
▼ ▼
Need caching? Global Accelerator
│ (gaming, IoT, VoIP)
┌──────┴──────┐
▼ ▼
Yes No
│ │
▼ ▼
CloudFront Global Accelerator
(if static IPs needed) What's the goal?
│
┌──────────┬──────────┼──────────┬──────────┐
▼ ▼ ▼ ▼ ▼
Cache Static Fast Edge Block
Content IPs Failover Compute Country
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
CloudFront Global Global CloudFront CloudFront
Accel Accel Functions/ Geo
Lambda@Edge Restriction| If question mentions… | Answer is… |
|---|---|
| “cache at edge” | CloudFront |
| “static content” / “CDN” | CloudFront |
| “gaming” / “UDP” | Global Accelerator |
| “IoT” / “MQTT” | Global Accelerator |
| “VoIP” / “real-time” | Global Accelerator |
| “static IP” / “whitelist IP” | Global Accelerator |
| “fast failover” / “<1 min failover” | Global Accelerator |
| “origin failover” / “HA for origin” | CloudFront Origin Groups |
| “restrict S3 to CloudFront only” | OAC + S3 Bucket Policy |
| “private content” / “authenticated access” | Signed URL/Cookie |
| “block by country” | Geo Restriction |
| “different cache per path” | Behaviors |
| “redirect HTTP to HTTPS” | Viewer Protocol Policy |
| “force cache refresh” | Invalidation |
| “run code at edge” | CloudFront Functions or Lambda@Edge |
| “lightweight edge compute” | CloudFront Functions |
| “complex edge compute” / “DB lookup” | Lambda@Edge |
| “encrypt specific fields” | Field-Level Encryption |
| “reduce CloudFront costs” | Price Class 100/200 |
| “SSL certificate for CloudFront” | ACM in us-east-1 |
| Statement | Why It’s Wrong |
|---|---|
| Global Accelerator caches content | GA proxies packets, no caching |
| CloudFront for UDP/gaming | CloudFront = HTTP/HTTPS only |
| Route 53 for instant failover | Route 53 = DNS TTL delay (not instant) |
| Security Groups on CloudFront | Can’t attach SGs to CloudFront |
| OAC for non-S3 origins | OAC is S3-only; use auth headers for ALB/custom |
| CloudFront Functions for DB access | No network access — use Lambda@Edge |
| CloudFront Functions at origin triggers | Viewer triggers only — use Lambda@Edge |
| Signed URL for multiple files | Use Signed Cookie for multiple files |
| Global Accelerator for caching | No caching — use CloudFront |
| Cannot… | Instead… |
|---|---|
| Use CloudFront for UDP | Use Global Accelerator |
| Attach Security Groups to CloudFront | Use Geo Restriction, WAF, or Signed URLs |
| Use OAI (deprecated) | Use OAC (Origin Access Control) |
| Run CloudFront Functions at origin | Use Lambda@Edge for origin triggers |
| Access network in CloudFront Functions | Use Lambda@Edge |
| Use CloudFront cert from other regions | ACM certificate must be in us-east-1 |
| Get static IPs from CloudFront | Use Global Accelerator for static IPs |
Keywords: CDN, cache, static files, images, videos, global distribution
Answer: CloudFront
Why: CloudFront caches at 400+ edge locations. Global Accelerator doesn’t cache.
Keywords: UDP, TCP, gaming, real-time, MQTT, non-HTTP
Answer: Global Accelerator
Why: CloudFront = HTTP/HTTPS only. Global Accelerator supports any TCP/UDP.
Keywords: static IP, firewall whitelist, fixed IP addresses
Answer: Global Accelerator (2 Anycast IPs)
Why: CloudFront uses dynamic IPs. Global Accelerator provides 2 static Anycast IPs.
Keywords: instant failover, <1 minute, DR, disaster recovery
Answer: Global Accelerator
Why: Route 53 = DNS TTL delay. Global Accelerator = health-check driven, <1 min.
Keywords: CloudFront HA, origin fails, backup origin
Answer: CloudFront Origin Groups
Why: Primary + secondary origin. Automatic failover on 4xx/5xx errors.
Keywords: S3 only via CloudFront, prevent direct S3 access, secure S3 origin
Answer: OAC (Origin Access Control) + S3 Bucket Policy
Why: OAC creates CloudFront identity. S3 policy allows only that identity.
Keywords: authenticated users, premium content, temporary access
Answer: Signed URL (1 file) or Signed Cookie (multiple files)
Why: Signed URLs/Cookies include expiration, IP restrictions, trusted signers.
Keywords: /api/, /images/, path-based, different cache, different origin
Answer: CloudFront Behaviors
Why: Each behavior = path pattern + origin + cache policy + settings.
Keywords: geo blocking, country restriction, copyright, regional licensing
Answer: CloudFront Geo Restriction
Why: Allowlist or blocklist countries. Based on Geo-IP database.
Keywords: URL rewrite, header manipulation, JWT validation, lightweight
Answer: CloudFront Functions
Why: <1ms execution, JavaScript, millions req/sec, 1/6 cost of Lambda@Edge.
Keywords: database lookup, external API call, image resize, origin trigger
Answer: Lambda@Edge
Why: Up to 10s execution, network access, Node.js/Python, all 4 triggers.
Keywords: stale content, cache not updating, force refresh
Answer: CloudFront Invalidation
Why: Bypass TTL, force edge locations to fetch new content from origin.
Keywords: cost optimization, cheaper CDN, reduce edge locations
Answer: Price Class 100 or 200
Why: Fewer edge locations = lower cost (but potentially higher latency for excluded regions).
Keywords: credit card encryption, PII at edge, field-level security
Answer: CloudFront Field-Level Encryption
Why: Encrypts specific fields at edge → stays encrypted through entire flow.
Keywords: HTTPS, custom domain, SSL/TLS certificate
Answer: ACM certificate in us-east-1
Why: CloudFront is global but requires certificates in us-east-1 region.
| Service | Scope | Certificate/Key Location | Cross-Region Option |
|---|---|---|---|
| IAM | Global | N/A | N/A (account-wide) |
| Route 53 | Global | N/A | N/A (global DNS) |
| CloudFront | Global | us-east-1 | N/A (already global) |
| Global Accelerator | Global | N/A | N/A (already global) |
| API Gateway (Edge) | Regional* | us-east-1 | Uses CloudFront |
| API Gateway (Regional) | Regional | Same region | Deploy per region |
| Lambda | Regional | N/A | Deploy per region |
| Lambda@Edge | Global* | N/A | Author in us-east-1 |
| DynamoDB | Regional | N/A | Global Tables |
| Aurora | Regional | N/A | Global Database |
| KMS | Regional | Same region | Multi-Region Keys (mrk-) |
| CloudHSM | Regional | Same region | ❌ None! |
| Secrets Manager | Regional | N/A | Multi-region replication |
| S3 | Regional | N/A | Cross-Region Replication |
| ALB/NLB | Regional | Same region | Use Global Accelerator |
*Edge-Optimized API Gateway lives in one region but uses CloudFront for routing
| Feature | CloudFront | Global Accelerator |
|---|---|---|
| Purpose | Cache content at edge | Route traffic via AWS backbone |
| Protocols | HTTP/HTTPS only | Any TCP/UDP |
| Caching | ✅ Yes | ❌ No (proxies packets) |
| Static IPs | ❌ No | ✅ 2 Anycast IPs |
| Use cases | Websites, APIs, streaming | Gaming, IoT, VoIP |
| Failover | Origin Groups (TTL-based) | <1 min (health checks) |
| Edge compute | ✅ Functions/Lambda@Edge | ❌ No |
| DDoS protection | ✅ Shield | ✅ Shield |
| Feature | CloudFront Functions | Lambda@Edge |
|---|---|---|
| Language | JavaScript only | Node.js, Python |
| Execution time | <1 ms | Up to 5-10 sec |
| Memory | 2 MB | 128-3008 MB |
| Scale | Millions req/sec | Thousands req/sec |
| Triggers | Viewer only | Viewer + Origin |
| Network access | ❌ No | ✅ Yes |
| Cost | 1/6th of Lambda@Edge | Higher |
| Feature | Signed URL | Signed Cookie | S3 Pre-Signed URL |
|---|---|---|---|
| Access scope | 1 file | Multiple files | 1 file |
| Access via | CloudFront | CloudFront | Direct S3 |
| Use case | Single download | Streaming, multi-file | Direct S3 access |
| Caching | ✅ Yes | ✅ Yes | ❌ No (S3 direct) |
| Scenario | Service |
|---|---|
| Cache static content globally | CloudFront |
| Cache + origin failover | CloudFront + Origin Groups |
| Non-HTTP (gaming, IoT) | Global Accelerator |
| Static IPs for whitelisting | Global Accelerator |
| Fastest failover (<1 min) | Global Accelerator |
| DNS-based routing | Route 53 |
| Single region load balancing | ELB (ALB/NLB) |
| Edge compute (simple) | CloudFront Functions |
| Edge compute (complex) | Lambda@Edge |
| Service | Failover Speed | Mechanism |
|---|---|---|
| Route 53 | DNS TTL (30s - 5min+) | DNS resolution |
| Global Accelerator | <1 minute | Health checks, network-level |
| CloudFront Origin Groups | Immediate on error | Origin error triggers |
| ELB | Seconds | Target health checks |
| Origin Type | Use Case | Security |
|---|---|---|
| S3 Bucket | Static files | OAC + Bucket Policy |
| S3 Website | Static website | Public bucket or signed URLs |
| ALB | Dynamic content | Security Group, custom headers |
| VPC Origin | Private resources | No public exposure needed |
| Custom HTTP | Any HTTP server | Auth headers, IP whitelist |
| Item | Value |
|---|---|
| Edge locations | 400+ globally |
| Global Accelerator static IPs | 2 Anycast IPs |
| CloudFront Functions execution | <1 ms |
| Lambda@Edge max execution | 5-10 seconds |
| CloudFront Functions memory | 2 MB |
| Lambda@Edge max memory | 3008 MB |
| ACM certificate region | us-east-1 (required) |
| Global Accelerator failover | <1 minute |
| Question Contains | → Instant Answer |
|---|---|
| “global service” / “no region” | IAM, Route 53, CloudFront, Global Accelerator |
| “multi-region encryption” | KMS Multi-Region Keys (mrk-) |
| “multi-region database (NoSQL)” | DynamoDB Global Tables |
| “multi-region database (SQL)” | Aurora Global Database |
| “CloudHSM multi-region” | IMPOSSIBLE (single-region only) |
| “Lambda@Edge region” | Author in us-east-1 |
| “Edge-Optimized API cert” | us-east-1 |
| “Regional API cert” | Same region as API |
| “cache at edge” / “CDN” | CloudFront |
| “static content globally” | CloudFront |
| “gaming” / “UDP” / “IoT” | Global Accelerator |
| “VoIP” / “real-time TCP” | Global Accelerator |
| “static IP” / “whitelist” | Global Accelerator |
| “<1 min failover” | Global Accelerator |
| “origin failover” | CloudFront Origin Groups |
| “S3 only via CloudFront” | OAC + Bucket Policy |
| “OAI” | Legacy → use OAC |
| “private content” (1 file) | Signed URL |
| “private content” (many files) | Signed Cookie |
| “block by country” | Geo Restriction |
| “path-based settings” | Behaviors |
| “HTTP → HTTPS” | Viewer Protocol Policy |
| “stale cache” / “force refresh” | Invalidation |
| “simple edge code” | CloudFront Functions |
| “complex edge code” / “DB” | Lambda@Edge |
| “viewer triggers only” | CloudFront Functions (or Lambda@Edge) |
| “origin triggers” | Lambda@Edge only |
| “encrypt form fields” | Field-Level Encryption |
| “reduce CloudFront cost” | Price Class 100/200 |
| “CloudFront SSL cert” | ACM in us-east-1 |
| “no caching, just faster” | Global Accelerator |
| “DNS routing” | Route 53 (not CloudFront/GA) |
| “single region LB” | ELB (not CloudFront/GA) |
When stuck between options, eliminate systematically:
□ Is it HTTP/HTTPS?
→ Yes = CloudFront or Global Accelerator
→ No (UDP, raw TCP) = Global Accelerator only
□ Do they need CACHING?
→ Yes = CloudFront
→ No = Global Accelerator (or neither)
□ Do they need STATIC IPs?
→ Yes = Global Accelerator (or NLB)
→ No = CloudFront is fine
□ What's the FAILOVER requirement?
→ Instant (<1 min) = Global Accelerator
→ DNS-based = Route 53
→ Origin failover = CloudFront Origin Groups
□ Is it about EDGE COMPUTE?
→ Simple (headers, rewrites) = CloudFront Functions
→ Complex (network, DB) = Lambda@Edge
→ Origin triggers = Lambda@Edge only
□ Is it about PRIVATE CONTENT?
→ 1 file = Signed URL
→ Multiple files = Signed Cookie
→ Direct S3 = S3 Pre-Signed URL
□ Is it about S3 ORIGIN SECURITY?
→ Restrict to CloudFront = OAC + Bucket Policy
→ OAI mentioned = legacy, use OAC
□ Is it about COUNTRY RESTRICTION?
→ Block/allow by country = Geo Restriction
→ Not Security Groups (can't attach to CF)
□ What REGION for SSL cert?
→ CloudFront = us-east-1 (always)
→ ALB = same region as ALBEC2 (Elastic Compute Cloud) is virtual computer (instance) in the cloud.
EC2 consists:
EC2 configuration options:
EC2 Instance Types:
T/M → General (Typical, Moderate) C → Compute (CPU) R → Memory (RAM) P/G → Accelerated (Processing/GPU) I/D → Storage (I/O, Disk)
r6i.2xlarge │││ └─── Size within the instance class ││└────── Additional capabilities (i = Intel) │└─────── Generation of hardware (6th) └──────── Family - instance class (R = Memory Optimized)
EC2 Placement group: control over the EC2 Instance placement strategy. Placement group strategies:
CLUSTER (same rack) SPREAD (diff racks) PARTITION (diff racks)
┌─────────────────┐ ┌───┐ ┌───┐ ┌───┐ ┌─────┐ ┌─────┐ ┌─────┐
│ ┌──┐┌──┐┌──┐┌──┐│ │EC2│ │EC2│ │EC2│ │Part1│ │Part2│ │Part3│
│ │ ││ ││ ││ ││ └───┘ └───┘ └───┘ │┌──┐ │ │┌──┐ │ │┌──┐ │
│ └──┘└──┘└──┘└──┘│ Rack1 Rack2 Rack3 ││ │ │ ││ │ │ ││ │ │
└─────────────────┘ (max 7 per AZ) │└──┘ │ │└──┘ │ │└──┘ │
10 Gbps, 1 AZ └─────┘ └─────┘ └─────┘
Low latency High availability 100s instances (Kafka)⚠️ Exam trap: Cluster = same rack (low latency, high risk); Spread = different racks (max 7/AZ); Partition = different racks (100s instances, Kafka/Cassandra)
AMI (Amazon Machine Image) - customization of an EC2 instance, added ext. software.
⚠️ Exam trap: AMIs are region-specific. Cannot launch EC2 from AMI in another region — must copy AMI to target region first (creates new AMI ID)
EC2 Image Builder service to automate the creation, maintain, validate and test of Virtual Machine or container images. Can be run on schedule.
Elastic Network Interfaces (ENI): logical component in a VPC that represents a virtual network card. Bound to a specific AZ.
ENI attributes:
NOTE: You can create ENI independently and attach them on the fly (move them) on EC2 instances for failover.
Security Groups vs NACLs:
| Feature | Security Groups | NACLs |
|---|---|---|
| Level | Instance (ENI) | Subnet |
| State | Stateful (return traffic auto-allowed) | Stateless (must allow both directions) |
| Rules | Allow only | Allow AND Deny |
| Rule Order | All rules evaluated | Rules processed in order (lowest first) |
| Default | Deny all inbound, allow all outbound | Allow all (default NACL) |
| Association | Assigned to instances | Assigned to subnets |
┌─────────────────────────────────────────────────────────┐
│ VPC │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Subnet (with NACL) │ │
│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │
│ │ │ Security Group │ │ Security Group │ │ │
│ │ │ ┌───────────────┐ │ │ ┌───────────────┐ │ │ │
│ │ │ │ EC2 │ │ │ │ EC2 │ │ │ │
│ │ │ └───────────────┘ │ │ └───────────────┘ │ │ │
│ │ └─────────────────────┘ └─────────────────────┘ │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘⚠️ Exam trap: Security Groups = stateful (allow inbound → outbound auto-allowed). NACLs = stateless (must explicitly allow both directions)
EC2 Instance Lifecycle:
EC2 Hibernate:
EC2 Hibernate - Requirements & Limits:
⚠️ Exam trap: Root EBS must be encrypted for hibernate. Max 60 days, max 150GB RAM
EC2 Purchasing Options:
| Option | Description | Savings | Use Case |
|---|---|---|---|
| On-Demand | Pay by second (Linux/Win) or hour | None | Short-term, unpredictable workloads |
| Reserved | 1 or 3 year commitment | Up to 72% | Steady-state usage (databases) |
| Savings Plans | Commit to $/hour for 1-3 years | Up to 72% | Flexible across instance types |
| Spot | Bid on unused capacity | Up to 90% | Fault-tolerant, flexible workloads |
| Dedicated Hosts | Physical server for your use | - | Compliance, licensing (per-socket) |
| Dedicated Instances | Hardware dedicated to you | - | Compliance (no server control) |
| Capacity Reservations | Reserve capacity in specific AZ | None | Ensure availability, no discount |
Spot Instances:
⚠️ Exam trap: Spot = cheapest but can be terminated. Use for batch jobs, data analysis, CI/CD, NOT databases
Reserved Instances:
⚠️ Exam trap: Reserved = commit for 1-3 years. Convertible RI = less discount but more flexibility
Connect to EC2:ssh -i /<path>/<key_pair_name>.pem <instance_user_name>@<instance_public_dns_name/IP>
Example:ssh -i /home/kali/Downloads/aws.pem ubuntu@51.20.123.211
EC2 alone is just a VM. The entire ecosystem exists to solve its inherent problems:
Deriving answers: When you see a limitation scenario, think: “What problem is this solving?” The answer maps to the appropriate service.
Every workload has a bottleneck. Match the instance family to it:
| Bottleneck | Family | Memory Aid |
|---|---|---|
| Nothing specific | T/M | Typical, Moderate (General) |
| CPU/Processing | C | CPU = Compute |
| Memory/RAM | R | RAM = Memory |
| GPU/AI/ML | P/G | Processing/GPU |
| Disk/IOPS | I/D | I/O, Disk = Storage |
Deriving answers: “Processing large datasets in-memory” → Bottleneck is RAM → R family. “Batch processing” → Bottleneck is CPU → C family.
You can’t have both maximum performance AND maximum availability. Choose your priority:
| Priority | Strategy | Trade-off |
|---|---|---|
| Lowest latency | Cluster | All in one rack → rack fails = all fail |
| Maximum isolation | Spread | Different racks → max 7 instances per AZ |
| Partition isolation + scale | Partition | Different racks → 100s of instances |
Deriving answers:
An AMI is a complete image (OS + software + config). The trade-off: specificity vs. portability.
Deriving answers: “Launch instance in another region from existing AMI” → Must copy first (can’t use original AMI ID)
WHY stateful matters: If you allow traffic IN, the response OUT is automatic. You don’t need to think about return traffic.
WHY stateless matters: You must explicitly allow BOTH directions. More control, more work.
| Question | SG | NACL |
|---|---|---|
| “Where does it apply?” | Instance (ENI) | Subnet |
| “Can it DENY traffic?” | No (allow only) | Yes |
| “Do I need to allow return traffic?” | No (stateful) | Yes (stateless) |
| “Which is evaluated first?” | Rules are combined | Rules processed in order |
Deriving answers: “Block specific IP” → NACLs (SGs can’t deny). “Allow port 443 inbound” → Both work, but SGs don’t need outbound rule.
WHY hibernate exists: Cold boots are slow because RAM is empty. Hibernate saves RAM state.
WHY encryption is required: RAM contains sensitive data — writing it to disk unencrypted = security risk.
Deriving answers:
The fundamental trade-off: commitment = savings. More flexibility = more cost.
Most Expensive Cheapest
(Most Flexible) (Least Flexible)
│ │
▼ ▼
On-Demand ─→ Capacity Res ─→ Savings Plans ─→ Reserved ─→ Spot
100% 100% ~72% ~72% ~90%But Spot has a catch: AWS can take it back. Use only for work that can be interrupted.
The Mental Model:
Dedicated Hosts and Dedicated Instances exist for compliance, not speed.
| Need | Solution |
|---|---|
| Per-socket/per-core licensing | Dedicated Host (you see the physical server) |
| Regulatory: “no shared hardware” | Either works (Dedicated Instance = simpler) |
| Just want better performance | Neither (use instance optimization instead) |
Deriving answers: “Bring your own license” or “socket-based licensing” → Dedicated Host
Spot is AWS’s “leftover capacity” at a discount. The trade-off: they can take it back.
Critical rules:
Good for: Batch processing, CI/CD, data analysis, anything that can restart Bad for: Databases, user-facing apps, anything that can’t handle interruption
| Action | EBS Root | Instance Store | RAM |
|---|---|---|---|
| Stop | ✅ Preserved | ❌ Lost | ❌ Lost |
| Terminate | ❌ Deleted (default) | ❌ Lost | ❌ Lost |
| Hibernate | ✅ Preserved + RAM saved | ❌ Lost | ✅ Saved to EBS |
Deriving answers: “Data survives restart?” → EBS only. “RAM survives?” → Hibernate only.
What's the bottleneck?
│
├─→ "Nothing specific" ─────────────→ T/M (General Purpose)
│
├─→ "CPU" / "batch" / "compute" ───→ C (Compute)
│
├─→ "RAM" / "in-memory" / "cache" ─→ R (Memory)
│
├─→ "GPU" / "ML" / "AI" ───────────→ P/G (Accelerated)
│
└─→ "IOPS" / "database" / "OLTP" ──→ I/D (Storage)What's the priority?
│
├─→ "Lowest latency" / "10 Gbps" ──→ Cluster
│
├─→ "High availability" / "isolation" ─→ Spread (max 7/AZ)
│
└─→ "Kafka" / "Hadoop" / "Cassandra" ─→ PartitionCan it be interrupted?
│
├─→ YES ─────────────────────────────→ Spot (90% savings)
│
└─→ NO
│
└─→ How long do you need it?
│
├─→ "Hours/days" ──────→ On-Demand
│
├─→ "1-3 years" ───────→ Reserved/Savings Plans
│
└─→ "Guaranteed capacity, no commit" ─→ Capacity Res| You CANNOT… | Because… |
|---|---|
| Launch EC2 from AMI in different region | AMIs are region-locked (copy first) |
| Have > 7 instances in Spread placement group (per AZ) | Spread = different rack per instance, racks limited |
| Hibernate with unencrypted root EBS | RAM data written to disk = security risk |
| Hibernate with > 150GB RAM | Storage/write time constraint |
| Hibernate for > 60 days | AWS limitation |
| Block traffic with Security Group | SGs can only ALLOW (use NACLs to deny) |
Keywords: low latency, 10 Gbps, HPC, tightly coupled Answer: Cluster placement group Why: Same rack = same network switch = lowest latency. Trade-off is single point of failure.
Keywords: high availability, fault tolerance, critical, isolated Answer: Spread placement group Why: Different racks = different failure domains. Limit: 7 instances per AZ.
Keywords: Kafka, Hadoop, Cassandra, distributed, large scale, partitions Answer: Partition placement group Why: Partition-aware applications distribute replicas across partitions. Scales to 100s.
Keywords: in-memory, real-time analytics, caching, SAP HANA Answer: Memory Optimized (R family) Why: Bottleneck is RAM. R = RAM.
Keywords: batch, transcoding, compute-intensive, scientific modeling Answer: Compute Optimized (C family) Why: Bottleneck is CPU. C = CPU.
Keywords: cost-effective, can tolerate interruption, batch, CI/CD Answer: Spot Instances Why: 90% savings, but can be interrupted with 2-min warning. OK for resilient workloads.
Keywords: database, steady, 24/7, long-term, predictable Answer: Reserved Instances Why: 72% savings for 1-3 year commitment. Databases run continuously.
Keywords: BYOL, per-socket, per-core, software license Answer: Dedicated Host Why: You need visibility into physical server (sockets/cores) for licensing.
Keywords: fast boot, preserve RAM, reduce initialization time Answer: EC2 Hibernate Why: RAM saved to EBS, no cold boot. Must have encrypted root volume.
Keywords: block IP, deny traffic, blacklist Answer: NACL (not Security Group) Why: Security Groups can only ALLOW. NACLs can DENY.
Keywords: cross-region, AMI, different region Answer: Copy AMI to target region first Why: AMIs are region-specific. Cannot use AMI ID from another region.
Keywords: compliance, dedicated, isolated hardware Answer: Dedicated Instance Why: Simpler than Dedicated Host when you don’t need socket/core visibility.
Keywords: capacity, guarantee, specific AZ, no discount needed Answer: On-Demand Capacity Reservation Why: Reserves capacity immediately. No commitment required, but no discount either.
Keywords: flexible, multiple instance types, Savings Plans Answer: Compute Savings Plans Why: Commit $/hour, use across any instance type/region. More flexible than Reserved.
| Family | Optimized For | Use Cases | Memory Aid |
|---|---|---|---|
| T, M | Balance | Web servers, small DBs | Typical, Moderate |
| C | CPU | Batch, video encoding | CPU |
| R | RAM | In-memory DBs, caching | RAM |
| P, G | GPU | ML, graphics | Processing, GPU |
| I, D | Disk IOPS | Databases, data warehouses | I/O, Disk |
| Strategy | Same Rack? | Max Instances | Use Case |
|---|---|---|---|
| Cluster | Yes | No limit | Low latency, HPC |
| Spread | No | 7 per AZ | High availability |
| Partition | No | 100s | Hadoop, Kafka, Cassandra |
| Option | Savings | Commitment | Interruption? |
|---|---|---|---|
| On-Demand | 0% | None | No |
| Reserved | 72% | 1-3 years | No |
| Savings Plans | 72% | $/hour for 1-3 years | No |
| Spot | 90% | None | YES (2-min warning) |
| Dedicated Host | Varies | Optional | No |
| Capacity Res | 0% | None | No |
| Requirement | Limit |
|---|---|
| RAM | < 150 GB |
| Root Volume | EBS, encrypted, large enough |
| Max Duration | 60 days |
| NOT Supported | Dedicated Hosts, bare metal |
| Question Contains | → Instant Answer |
|---|---|
| “lowest latency between instances” | Cluster placement group |
| “10 Gbps bandwidth” | Cluster placement group |
| “spread across racks” | Spread or Partition |
| “max 7 instances” | Spread placement group |
| “Kafka, Hadoop, Cassandra” | Partition placement group |
| “in-memory” / “real-time analytics” | R family (Memory) |
| “batch processing” | C family (Compute) |
| “video transcoding” | C family (Compute) |
| “GPU” / “ML training” | P/G family (Accelerated) |
| “high IOPS” / “OLTP” | I/D family (Storage) |
| “90% savings” | Spot Instances |
| “2-minute warning” | Spot Instances |
| “can be interrupted” | Spot Instances |
| “steady-state” + “database” | Reserved Instances |
| “1-3 year commitment” | Reserved Instances |
| “flexible across instance types” | Savings Plans |
| “per-socket licensing” | Dedicated Host |
| “BYOL” | Dedicated Host |
| “compliance + dedicated hardware” | Dedicated Host or Instance |
| “guarantee capacity” + “no commitment” | Capacity Reservation |
| “fast boot” / “preserve RAM” | EC2 Hibernate |
| “reduce startup time” | EC2 Hibernate |
| “encrypted root volume” + “hibernate” | Required for Hibernate |
| “block IP” | NACL (not SG) |
| “stateless” | NACL |
| “stateful” | Security Group |
| “deny traffic” | NACL |
| “allow only” | Security Group |
| “cross-region AMI” | Copy AMI first |
| “AMI different region” | Copy AMI first |
| “automate AMI creation” | EC2 Image Builder |
| “ENI failover” | Move ENI to standby instance |
□ Is the workload CPU-bound?
→ Yes = C family
→ No = continue
□ Does it need lots of RAM?
→ Yes = R family
→ No = continue
□ Does it need GPU?
→ Yes = P/G family
→ No = continue
□ Does it need high disk IOPS?
→ Yes = I/D family
→ No = T/M (General Purpose)□ Can workload tolerate interruption?
→ Yes = Consider Spot (90% savings)
→ No = continue
□ Is usage predictable for 1-3 years?
→ Yes = Reserved or Savings Plans
→ No = continue
□ Do you need flexibility across instance types?
→ Yes = Savings Plans
→ No = Reserved Instances
□ Short-term, unpredictable?
→ Yes = On-Demand□ Need lowest possible latency?
→ Yes = Cluster
→ No = continue
□ Need maximum isolation/availability?
→ Yes = Spread (max 7/AZ)
→ No = continue
□ Running Kafka/Hadoop/Cassandra at scale?
→ Yes = Partition
→ No = No placement group needed□ Is root volume EBS and encrypted?
→ No = Hibernate NOT available
□ Is RAM < 150GB?
→ No = Hibernate NOT available
□ Is it Dedicated Host or bare metal?
→ Yes = Hibernate NOT available
□ All above passed?
→ Hibernate availableAmazon EC2 Instance store provides temporary block-level storage for your instance. This storage is located on disks that are physically attached to the host computer. Instance store is ideal for temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content. It can also be used to store temporary data that you replicate across a fleet of instances, such as a load-balanced pool of web servers.
Amazon EBS (Elastic Block Store) volume is a block-level storage, it’s a network drive, not physical drive - uses the network to communicate the instance, has a bit of latency. EBS has a provisioned capacity (size in GBs, and IOPS) that can be increased over time (billed by provisioned, not used).
EBS Delete on Termination attribute controls the EBS behaviour when an EC2 instance terminates. Root EBS volme is going to be deleted by default, any other attached EBS volume will get disabled Termination attribute.
EBS Snapshot backup (snapshot) of your EBS volume at a point in time. Not necessary to detach volume, but recommended. Snapshots consume IO - avoid during high traffic. Possible to copy snapshots across AZ or Regions. EBS Snapshot Archive EBS Snapshots could be moved to Archive (that is 75% cheaper, but it takes within 24 to 72 hours for restoring the archive).* Recycle Bin rules to retain deleted snapshots to recover them after an accidental deletion (from 1 day to 1 year) Fast Snapshot Restore (FSR) - eliminates latency on first use of EBS volume created from snapshot by pre-initializing all data blocks. Without FSR, volumes load data lazily from S3 causing performance penalty until “warmed up”. Enabled per snapshot per AZ. Expensive - use for critical workloads needing immediate full performance (databases, time-sensitive apps).
EBS Snapshot - Cross-Region & Encryption Flow:
┌─────────┐ snapshot ┌───────────┐ copy ┌───────────┐
│ EBS │ ───────────→ │ Snapshot │ ─────────→ │ Snapshot │
│ (AZ-A) │ │ (Region A)│ (encrypt) │ (Region B)│
└─────────┘ └─────┬─────┘ └─────┬─────┘
│ restore │ restore
▼ ▼
┌─────────┐ ┌─────────┐
│ EBS │ │ EBS │
│ (AZ-A) │ │ (AZ-X) │
└─────────┘ └─────────┘Local EC2 Instance Store a high-performance hardware disk (better I/O performance than network drives - EBS volumes). Good for buffer / cache / scratch data / temporary content (Risk of data loss if hardware fails). Backups and Replication are your responsibility
⚠️ Exam trap: Instance Store = ephemeral (data lost on stop/terminate). Best I/O performance but no persistence
Instance Store Limitations:
EBS Volume Types
⚠️ Exam trap: gp2 IOPS linked to size (3 IOPS/GB); gp3 IOPS independent — know the difference!
Provisioned IOPS (PIOPS) SSD - Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads: System boot volumes, databases workloads. Supports EBS Multi-attach; - io1: 4 GiB - 16 TiB, up to 64,000 IOPS (linked to size - 50 IOPS per 1 GiB, max IOPS at 1,280 GB); - io2 (higher durability - 99.999%): 4 GiB - 16 TiB, up to 64,000 IOPS (linked to size - 50 IOPS per 1 GiB, max IOPS at 1,280 GB); - io2 Block Express (sub-ms latency): 4 GiB - 64 TiB, up to 256,000 IOPS, (linked to size - 50 IOPS per 1 GiB, max IOPS at 1,280 GB);
EBS Multi-Attach: Achieve higher application availability in clustered Linux applications (ex: Teradata) by connecting the same EBS volume to multiple (up to 16) EC2 Instances at a time. Must be in the same AZ and only cluster-aware (GFS2, OCFS2, and NOT EXT4/XFS) file system is supported.
| Feature | gp2 | gp3 | io1 | io2 | io2 Block Express |
|---|---|---|---|---|---|
| Type | General Purpose SSD | General Purpose SSD | Provisioned IOPS SSD | Provisioned IOPS SSD | Provisioned IOPS SSD |
| Size | 1 GiB - 16 TiB | 1 GiB - 16 TiB | 4 GiB - 16 TiB | 4 GiB - 16 TiB | 4 GiB - 64 TiB |
| Max IOPS | 16,000 | 16,000 | 64,000* | 64,000* | 256,000 |
| Baseline IOPS | 3 IOPS/GiB (min 100) | 3,000 | Provisioned | Provisioned | Provisioned |
| IOPS:GiB Ratio | 3:1 (linked) | Independent | 50:1 | 500:1 | 1,000:1 |
| Max Throughput | 250 MiB/s | 1,000 MiB/s | 1,000 MiB/s | 1,000 MiB/s | 4,000 MiB/s |
| Durability | 99.8% - 99.9% | 99.8% - 99.9% | 99.8% - 99.9% | 99.999% | 99.999% |
| Latency | Single-digit ms | Single-digit ms | Single-digit ms | Single-digit ms | Sub-millisecond |
| Boot Volume | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| Multi-Attach* | ❌ No | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes |
| Use Case | Dev/test, boot volumes | General workloads | Databases, critical apps | Databases, critical apps | Highest performance |
*64,000 IOPS on Nitro instances, 32,000 on others
HDD Volume Types (Cannot be boot volumes)
| Feature | st1 (Throughput Optimized) | sc1 (Cold HDD) |
|---|---|---|
| Size | 125 GiB - 16 TiB | 125 GiB - 16 TiB |
| Max Throughput | 500 MiB/s | 250 MiB/s |
| Max IOPS | 500 | 250 |
| Boot Volume | ❌ No | ❌ No |
| Use Case | Big Data, Data Warehouses, Log Processing | Infrequent access, lowest cost |
| Cost | Low | Lowest |
⚠️ Exam trap: HDD (st1/sc1) cannot be boot volumes. Only SSD (gp2/gp3/io1/io2) can boot
EBS Encryption: Fully managed, transparent encryption using KMS (AES-256) with minimal latency impact. Encrypts:
*Encrypt unencrypted EBS volume: Create snapshot → Copy snapshot with encryption enabled → Create volume from encrypted snapshot → Attach to instance.
Amazon EFS (Elastic File System) - managed NFS that can be mounted on many EC2 instances and on-premises (multi-AZ). Highly available, auto-scaling (petabytes, no capacity planning), expensive (~3x gp2 cost, pay-per-use). Use cases: content management, web serving, data sharing, WordPress.
⚠️ Exam trap: EFS = Linux only (POSIX). Performance Mode cannot be changed after creation; Throughput Mode can
EFS Performance & Throughput Modes:
EFS Storage Classes (lifecycle policies move files after N days):
Amazon FSx for Windows File Server fully managed, highly reliable and scalable Windows native shared file system based on SMB protocol and Windows NTFS (Integrated with Microsoft Active Directory).
Amazon FSx for Lustre (Linux cluster) a fully managed high-performance, scalable file system for High Performance Computing (HPC): machine learning, analytics, video processing and financial modeling.
Instance Store vs EBS vs EFS:
| Feature | Instance Store | EBS | EFS |
|---|---|---|---|
| Type | Block storage (local) | Block storage (network) | File storage (NFS) |
| Instances | 1 | 1 (except io1/io2 Multi-Attach) | 100s across AZs |
| AZ | Locked to instance | Locked to one AZ | Multi-AZ (regional) |
| Persistence | Ephemeral (lost on stop) | Persists independently | Persists independently |
| Performance | Best (hardware attached) | Good, network latency | Good, higher latency |
| OS | Linux & Windows | Linux & Windows | Linux only (POSIX) |
| Cost | Included with instance | Provisioned capacity | ~3x EBS, pay-per-use |
| Use case | Cache, temp data | Boot volumes, databases | Shared files, WordPress |
┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐
│ Instance │ │ EBS │ │ EFS │
│ Store │ │ (Network) │ │ (Multi-AZ NFS) │
├─────────────┤ ├─────────────┤ ├─────────────────────────┤
│ ┌─────────┐ │ │ ┌─────┐ │ │ ┌───┐ ┌───┐ ┌───┐ │
│ │ EC2 │ │ │ │ EC2 │ │ │ │EC2│ │EC2│ │EC2│ │
│ │ ┌─────┐ │ │ │ └──┬──┘ │ │ └─┬─┘ └─┬─┘ └─┬─┘ │
│ │ │Disk │ │ │ │ │ │ │ └─────┼─────┘ │
│ │ └─────┘ │ │ │ ┌──┴──┐ │ │ ┌───┴───┐ │
│ └─────────┘ │ │ │ EBS │ │ │ │ EFS │ │
└─────────────┘ │ └─────┘ │ └───────┴───────┴────────┘
Ephemeral └─────────────┘ Shared across AZs
Best I/O Single AZ Linux only, pay-per-useAWS Storage Gateway: bridge between on-premise data and cloud data in S3, hybrid storage service to allow on-premise to seamlessly use the AWS Cloud. Use cases: disaster recovery, backup & restore, tiered storage.
Types of Storage Gateway:
On-Premises AWS Cloud
┌─────────────────────────┐ ┌─────────────────────────┐
│ │ │ │
│ ┌───────────────────┐ │ │ ┌─────────────────┐ │
│ │ File Gateway │──┼──────────────┼─→│ S3 / FSx │ │
│ └───────────────────┘ │ NFS/SMB │ └─────────────────┘ │
│ │ │ │
│ ┌───────────────────┐ │ │ ┌─────────────────┐ │
│ │ Volume Gateway │──┼──────────────┼─→│ EBS Snapshots │ │
│ └───────────────────┘ │ iSCSI │ └─────────────────┘ │
│ │ │ │
│ ┌───────────────────┐ │ │ ┌─────────────────┐ │
│ │ Tape Gateway │──┼──────────────┼─→│ S3 Glacier/Deep │ │
│ └───────────────────┘ │ VTL │ └─────────────────┘ │
└─────────────────────────┘ └─────────────────────────┘⚠️ Exam trap: File Gateway = S3/FSx (NFS/SMB); Volume Gateway = EBS snapshots (iSCSI); Tape Gateway = Glacier (VTL)
S3 (Simple Storage Service) provides object storage through a web service interface — “infinitely scaling” storage.
Amazon S3: allows to store objects (files) in ‘buckets’ (directories).
Amazon S3 offers unlimited storage space. The maximum file size for an object in Amazon S3 is 5 TB.
Use Cases: Backup/storage, Disaster Recovery, Archive, Hybrid Cloud storage, Media hosting, Data lakes & big data analytics, Static websites, Software delivery
Buckets:
⚠️ Exam trap: “Can’t create bucket” + correct IAM permissions → name already taken globally
Naming convention:
Objects have a Key, which is a full path to them (s3://<bucket_name>/<folder_name>/<file-name>). Max size of an Object is 5TB (5000GB), if uploading more than 5GB, should be used “multi-part upload”.
/ (UI tricks you)S3 Consistency Model:
⚠️ Exam trap: “Overwrite object, immediately read” → S3 always returns the latest version. Old “eventual consistency” behavior is gone. Distractors mentioning “might return previous data” or “might return new data” are wrong.
Amazon S3 Versioning protects against unintended deletes. It is enabled at the bucket level.
Amazon S3 Replication:
┌───────────┐
│ S3 Bucket │ (eu-west-1)
└─────┬─────┘
│ asynchronous
│ replication
▼
┌───────────┐
│ S3 Bucket │ (us-east-2)
└───────────┘S3 Security:
S3 Access Scenarios:
| Scenario | Use |
|---|---|
| IAM User → S3 | IAM Policy attached to user |
| EC2 Instance → S3 | IAM Role attached to EC2 |
| Cross-Account → S3 | Bucket Policy (resource-based) |
| Public/Anonymous → S3 | Bucket Policy with Principal: "*" |
1. IAM User Access 2. EC2 Instance Access 3. Cross-Account Access
┌──────────┐ ┌──────────┐ ┌──────────┐
│IAM Policy│ │ IAM Role │ │ Bucket │
└────┬─────┘ └────┬─────┘ │ Policy │
│ │ └────┬─────┘
┌────▼─────┐ ┌────▼─────┐ ▼
│ IAM User │───────────────▶│ EC2 │─────────────▶ ┌───────────┐
└──────────┘ └──────────┘ │ S3 Bucket │
│ └───────────┘
▼ ▲
┌──────────┐ ┌─────┴─────┐
│ S3 Bucket│ 4. Public Access │ IAM User │
└──────────┘ ┌──────────┐ │Other Acct │
│ Bucket │ └───────────┘
│ Policy │
│Principal:│
│ "*" │
└────┬─────┘
▼
┌───────────────────┐
│ Anonymous Visitor │───▶ S3 Bucket
└───────────────────┘Bucket Policy (JSON): Resources, Effect (Allow/Deny), Actions (API calls), Principal (account/user)
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "PublicRead",
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject"],
"Resource": ["arn:aws:s3:::examplebucket/*"]
}]
}Block Public Access — prevent data leaks:
Access granted if: (IAM permissions ALLOW it OR resource policy ALLOWS it) AND no explicit DENY
⚠️ Exam trap: Bucket policy ALLOWS but user can’t access → check for explicit DENY in IAM policy (DENY always wins)
Encryption: encrypt objects using encryption keys
S3 Static Website Hosting:
http://bucket-name.s3-website-<region>.amazonaws.com or http://bucket-name.s3-website.<region>.amazonaws.comS3 Durability & Availability:
S3 Storage Classes:
Move between classes manually or using S3 Lifecycle configurations
⚠️ Exam traps - Storage Classes:
S3 Storage Classes Comparison:
| Class | Avail. | AZs | Min Duration | Retrieval | Use Case |
|---|---|---|---|---|---|
| Standard | 99.99% | ≥3 | None | Instant, free | Frequently accessed |
| Intelligent-Tiering | 99.9% | ≥3 | None | Instant, free | Unknown access patterns |
| Standard-IA | 99.9% | ≥3 | 30 days | Instant, per GB | Infrequent but rapid access |
| One Zone-IA | 99.5% | 1 | 30 days | Instant, per GB | Secondary backups, recreatable |
| Glacier Instant | 99.9% | ≥3 | 90 days | ms, per GB | Once/quarter access |
| Glacier Flexible | 99.99% | ≥3 | 90 days | 1-5 min / 3-5 hr / 5-12 hr | Archive, flexible retrieval |
| Glacier Deep Archive | 99.99% | ≥3 | 180 days | 12 hr / 48 hr | Long-term archive |
| Express One Zone | 99.95% | 1 | None | <10ms | AI/ML, HPC, low-latency |
Durability: 99.999999999% (11 9’s) for ALL classes
⚠️ Exam trap: Lifecycle transition timing must respect minimum storage duration
S3 Performance:
bucket/folder1/sub1/file → prefix: /folder1/sub1/)⚠️ Exam traps:
S3 Batch Operations:
⚠️ Exam trap: “Encrypt existing objects” / “change encryption on all files” → S3 Batch Operations
S3 Inventory ──▶ Athena (filter) ──▶ S3 Batch Operations ──▶ Processed Objects
│ ▲
└── Objects List Report │
User: operation + paramsS3 Lifecycle Rules:
s3://bucket/mp3/*) or object tags (Department: Finance)⚠️ Exam trap:
Storage Class Transitions (allowed paths):
Standard ──┬──▶ Standard-IA ──┬──▶ Intelligent-Tiering ──┬──▶ One Zone-IA
│ │ │
│ ▼ ▼
├──▶ Glacier Instant ◀────────────────────────┤
│ │
▼ ▼
Glacier Flexible ◀─────────────────────────────────┤
│ │
▼ ▼
Glacier Deep Archive ◀─────────────────────────────┘
(All classes can transition DOWN, never UP)Lifecycle Scenarios:
| Scenario | Solution |
|---|---|
| Thumbnails recreatable, needed 60 days, then delete | One Zone-IA + expire after 60 days |
| Source images: immediate access 60 days, then 6hr retrieval OK | Standard → Glacier after 60 days |
| Recover deleted objects immediately for 30 days, then 48hr OK for 365 days | Versioning + noncurrent → Standard-IA → Glacier Deep Archive |
S3 Analytics - Storage Class Analysis:
⚠️ Exam trap: “Optimal days to transition” / “Lifecycle recommendations” → S3 Analytics (not Inventory!)
S3 Requester Pays:
S3 Event Notifications:
S3:ObjectCreated, S3:ObjectRemoved, S3:ObjectRestore, S3:Replication*.jpg) ┌──▶ SNS
│
S3 Events ─┼──▶ SQS
│
└──▶ LambdaS3 Event Notifications with EventBridge:
⚠️ Exam trap: “Get notified on object upload” → Event Notifications (NOT Access Logs, Analytics, or Select)
Overview:
┌─ Organization
├─ Accounts
S3 Storage Lens ───▶├─ Regions ───▶ Aggregate ───▶ Dashboard ───▶ ┌─ Summary Insights
(Configure) └─ Buckets (Analyze) ├─ Data Protection
└─ Cost Efficiency
(Optimize)Default Dashboard:
Metrics Categories:
| Category | Key Metrics | Use Cases |
|---|---|---|
| Summary | StorageBytes, ObjectCount | Identify fastest-growing or unused buckets |
| Cost-Optimization | NonCurrentVersionStorageBytes, IncompleteMultipartUploadStorageBytes | Find incomplete multipart uploads >7 days, transition candidates |
| Data-Protection | VersioningEnabledBucketCount, MFADeleteEnabledBucketCount, SSEKMSEnabledBucketCount | Audit data protection best practices |
| Access-Management | ObjectOwnershipBucketOwnerEnforcedBucketCount | Check Object Ownership settings |
| Event | EventNotificationEnabledBucketCount | Identify buckets with Event Notifications |
| Performance | TransferAccelerationEnabledBucketCount | Find buckets with Transfer Acceleration |
| Activity | AllRequests, GetRequests, PutRequests, BytesDownloaded | Understand storage request patterns |
| Status Code | 200OKStatusCount, 403ForbiddenErrorCount, 404NotFoundErrorCount | Monitor HTTP response distribution |
Free vs Paid:
| Feature | Free | Advanced (Paid) |
|---|---|---|
| Metrics | ~28 usage metrics | + Activity, Cost Optimization, Data Protection, Status Code |
| Retention | 14 days | 15 months |
| CloudWatch Publishing | ❌ | ✅ |
| Prefix Aggregation | ❌ | ✅ |
4 Encryption Methods:
| Method | Key Management | Header | Notes |
|---|---|---|---|
| SSE-S3 | AWS-managed | "x-amz-server-side-encryption": "AES256" | Default for new buckets, AES-256 |
| SSE-KMS | AWS KMS | "x-amz-server-side-encryption": "aws:kms" | Audit via CloudTrail, KMS quota limits |
| DSSE-KMS | AWS KMS (double) | "x-amz-server-side-encryption": "aws:kms:dsse" | Two layers of encryption, compliance |
| SSE-C | Customer-managed (outside AWS) | Key in every HTTP header | HTTPS required, S3 doesn’t store key |
| Client-Side | Customer encrypts before upload | N/A | Full control, use S3 Encryption Library |
⚠️ Exam trap: “Customer manages keys” + “never store keys in AWS” → SSE-C or Client-Side
⚠️ Exam trap: “Keys in AWS OK” + “control rotation policy” → SSE-KMS
⚠️ Exam trap: “Encrypt all objects by default” → Do nothing (SSE-S3 is automatic since Jan 2023)
Encryption Evaluation Order:
SSE-S3 (Server-Side Encryption with S3-Managed Keys):
User ──── HTTP(S) + Header ────▶ ┌─────────────────────────────────┐
(upload object) │ Amazon S3 │
│ Object + S3 Owned Key │
│ ↓ │
│ [Encryption] │
│ ↓ │
│ S3 Bucket (encrypted) │
└─────────────────────────────────┘SSE-KMS Limitation:
GenerateDataKey API, download calls Decrypt APISSE-KMS (Server-Side Encryption with KMS Keys):
User ──── HTTP(S) + Header ────▶ ┌─────────────────────────────────┐
(upload object) │ Amazon S3 │
│ Object + KMS Key (API call) │
│ ↓ │
│ [Encryption] │
│ ↓ │
│ S3 Bucket (encrypted) │
└─────────────────────────────────┘
▲
┌─────────┐ │ API call
│ KMS Key │───────────────────┘
└─────────┘
(GenerateDataKey / Decrypt)⚠️ Exam trap: “High-throughput S3” + “encryption” → SSE-S3 (not SSE-KMS!)
SSE-C (Server-Side Encryption with Customer-Provided Keys):
User ──── HTTPS ONLY ──────────▶ ┌─────────────────────────────────┐
(object + key in header) │ Amazon S3 │
│ Object + Client-Provided Key │
│ ↓ │
│ [Encryption] │
│ ↓ │
│ S3 Bucket (encrypted) │
└─────────────────────────────────┘
(S3 discards key after use)Client-Side Encryption:
┌──────┐ ┌────────────┐ ┌──────────────┐ ┌───────────┐
│ File │ + │ Client Key │ → │ [Encryption] │ → HTTP(S) → │ S3 Bucket │
└──────┘ └────────────┘ │ (client-side)│ │(encrypted)│
└──────────────┘ └───────────┘
(Customer manages keys + encryption cycle)Force Encryption in Transit (HTTPS):
aws:SecureTransport conditionaws:SecureTransport: false{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"Bool": { "aws:SecureTransport": "false" }
}
}https://www.example.com (port 443 implied for HTTPS)http://example.com/app1 & http://example.com/app2http://www.example.com & http://other.example.comCORS Flow (Preflight Request):
┌────────────────┐ ┌────────────────┐
│ Web Server │ │ Web Server │
│ (Origin) │ │ (Cross-Origin) │
│ example.com │ │ other.com │
└───────┬────────┘ └───────▲────────┘
│ │
│ HTTPS Request │
▼ │
┌─────────────┐ 1. OPTIONS (Preflight) │
│ Web Browser │──────────────────────────────────▶│
│ │ Host: other.com │
│ │ Origin: example.com │
│ │◀──────────────────────────────────│
│ │ 2. Preflight Response │
│ │ Access-Control-Allow-Origin: │
│ │ https://example.com │
│ │ Access-Control-Allow-Methods: │
│ │ GET, PUT, DELETE │
│ │──────────────────────────────────▶│
└─────────────┘ 3. GET / (actual request) │
Host: other.com │
Origin: example.com │S3 CORS:
* (all origins)S3 CORS Example (Static Website with Assets in Different Bucket):
┌─────────────┐ 1. GET /index.html ┌─────────────────────┐
│ Web Browser │────────────────────────────────────────▶│ S3: my-bucket-html │
│ │◀────────────────────────────────────────│ (Static Website) │
│ │ index.html │ Origin bucket │
│ │ └─────────────────────┘
│ │ 2. GET /images/coffee.jpg
│ │ Host: my-bucket-assets.s3-website...
│ │ Origin: my-bucket-html.s3-website...
│ │────────────────────────────────────────▶┌─────────────────────┐
│ │◀────────────────────────────────────────│ S3: my-bucket-assets│
│ │ Access-Control-Allow-Origin: │ (Static Website) │
└─────────────┘ my-bucket-html.s3-website... │ Cross-origin bucket │
│ ← CORS config here │
└─────────────────────┘⚠️ Exam trap: CORS errors on S3 → configure CORS on the target bucket (the one being requested), not the origin
Requires MFA code before critical S3 operations
⚠️ Exam trap: Never set logging bucket = monitored bucket → creates infinite loop, bucket grows exponentially
⚠️ Exam trap: “Audit who accessed/tried to access S3” → S3 Access Logs + Athena
Expiration:
| Method | Default | Max |
|---|---|---|
| S3 Console | - | 720 min (12 hours) |
| AWS CLI | 3600 sec (1 hour) | 604800 sec (168 hours / 7 days) |
Use Cases:
S3 Access Points:
Users (Finance) ───▶ Finance Access Point ───┐
(R/W to /finance/*) │
▼
Users (Sales) ─────▶ Sales Access Point ────▶ S3 Bucket ◀── Simple
(R/W to /sales/*) │ Bucket
│ Policy
Users (Analytics) ─▶ Analytics Access Point ─┘
(R to entire bucket)VPC Origin Access Points:
VPC Origin:
┌─────────────────────────────────────────────────────────────────────┐
│ VPC │
│ EC2 ──▶ VPC Endpoint ──▶ Access Point (VPC Origin) ──▶ S3 Bucket │
│ (Endpoint (Access Point (Bucket │
│ Policy) Policy) Policy) │
└─────────────────────────────────────────────────────────────────────┘S3 Object Lambda:
┌─────────────────────────────────────┐
E-Commerce App ──▶ Original Object ─┤ S3 Access Point ──▶ S3 Bucket │
│ │
Analytics App ───▶ Redacted Object ─┤ Object Lambda AP ──▶ Redacting λ ───┤
│ │
Marketing App ───▶ Enriched Object ─┤ Object Lambda AP ──▶ Enriching λ ◀──┼── Customer DB
└─────────────────────────────────────┘Use Cases:
⚠️ Exam trap: “Transform/redact data before retrieval” → S3 Object Lambda
| Feature | Glacier Vault Lock | S3 Object Lock |
|---|---|---|
| Applies to | Glacier Vaults only | Any S3 storage class |
| Requires Versioning | ❌ | ✅ |
| Lock Level | Entire vault | Per object version |
| Policy Immutable | Yes (after lock) | Depends on mode |
S3 Object Lock Modes:
| Mode | Who Can Delete? | Change Settings? | Use Case |
|---|---|---|---|
| Compliance | No one (including root) | ❌ | Regulatory requirements |
| Governance | Special permission users | ✅ | Internal policies |
Lock Reversal & Override Details:
| Lock Type | Can Shorten? | Can Remove? | Can Delete Object? | Who Can Override? |
|---|---|---|---|---|
| Compliance Retention | ❌ Never | ❌ Never | ❌ Until expires | No one — wait for expiry |
| Governance Retention | ✅ | ✅ | ✅ | Users with s3:BypassGovernanceRetention + header x-amz-bypass-governance-retention:true |
| Legal Hold | N/A | ✅ | ❌ While active | Users with s3:PutObjectLegalHold permission |
| Vault Lock | ❌ Never | ❌ Never | ❌ Per policy | No one — delete vault to remove (loses all data) |
Object Lock Features:
s3:PutObjectLegalHold)⚠️ Exam traps:
S3 stores objects (files) in buckets (containers). There are no real directories — just keys with slashes.
s3://bucket/folder/subfolder/file.txtTwo different concepts:
All storage classes have same durability. Availability differs.
Access granted if: (IAM allows OR Resource policy allows) AND no explicit DENY
| Who’s accessing? | Use… |
|---|---|
| IAM User in same account | IAM Policy |
| EC2/Lambda | IAM Role |
| Cross-account | Bucket Policy |
| Public/Anonymous | Bucket Policy with Principal: "*" |
DENY always wins — if any policy denies, access is denied.
Two different tools for different jobs:
Lifecycle cannot encrypt. Batch Operations can.
S3 scales per prefix:
Two mechanisms:
Compliance mode = NO ONE can delete (not even root or AWS Support)
What's the access pattern?
│
┌─────────────┬───────────┼───────────┬─────────────┬─────────────┐
▼ ▼ ▼ ▼ ▼ ▼
Frequent Unknown/ Infrequent Archive Archive Lowest
Access Changing Access (instant) (flexible) Latency
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
Standard Intelligent- Standard-IA Glacier Glacier Express
Tiering or One Zone Instant Flexible/ One Zone
Deep Archive Who manages the keys?
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
AWS manages You control Keys never in AWS
(no work) (audit/rotate) │
│ │ ┌───────┴───────┐
▼ ▼ ▼ ▼
SSE-S3 SSE-KMS SSE-C Client-Side
(default) (CloudTrail) (key in (encrypt before
header) upload)| If question mentions… | Answer is… |
|---|---|
| “unknown access pattern” | Intelligent-Tiering |
| “millisecond retrieval from archive” | Glacier Instant |
| “cheapest archive” / “rarely accessed” | Glacier Deep Archive |
| “lowest latency” / “AI/ML training” | Express One Zone |
| “recreatable data” + single AZ OK | One Zone-IA |
| “encrypt existing objects” | S3 Batch Operations |
| “transition to cheaper storage” | Lifecycle Rules |
| “delete old versions” | Lifecycle Expiration Actions |
| “delete incomplete multipart uploads” | Lifecycle Expiration Actions |
| “audit object access” | S3 Access Logs + Athena |
| “customer manages keys outside AWS” | SSE-C or Client-Side |
| “high throughput + encryption” | SSE-S3 (not KMS — quota limits) |
| “prevent deletion for X years” | Object Lock (Compliance) |
| “allow admin override” | Object Lock (Governance) |
| “cross-account access” | Bucket Policy |
| “generate temporary download link” | Pre-Signed URL |
| “different access per team/prefix” | S3 Access Points |
| “transform data before retrieval” | S3 Object Lambda |
| “read first X bytes” / “file header” | Byte-Range Fetch |
| “large file + unreliable network” | Multi-Part Upload |
| “faster uploads over long distance” | Transfer Acceleration |
| “analyze storage costs” | S3 Storage Lens |
| “lifecycle recommendations” | S3 Analytics |
| “replicate existing objects” | S3 Batch Replication |
| “CORS error” | Configure CORS on target bucket |
| Statement | Why It’s Wrong |
|---|---|
| Lifecycle Rules encrypt objects | Lifecycle = transition/delete only, not encrypt |
| SSE-KMS for high-throughput | SSE-KMS has API quota limits — use SSE-S3 |
| Replication for existing objects | Only new objects — use Batch Replication for existing |
| Access Logs for real-time alerts | Access Logs = audit, not notifications — use Event Notifications |
| CloudTrail for data access patterns | CloudTrail = API calls, not object-level access — use Access Logs |
| Object Lock without versioning | Versioning is required before enabling Object Lock |
| Compliance mode with admin override | Compliance = no one can override — use Governance for admin override |
| Glacier Flexible for instant access | Flexible = hours — use Glacier Instant for milliseconds |
| Standard-IA for archive | IA = infrequent access, not archive — use Glacier for archive |
| Cannot… | Instead… |
|---|---|
| Create bucket with existing name | Names are globally unique — choose different name |
| Encrypt with Lifecycle Rules | Use Batch Operations for encryption |
| Shorten Compliance retention | Wait for expiry (truly immutable) |
| Delete in Compliance mode | No one can — not even root or AWS Support |
| Replicate to bucket without versioning | Enable versioning on both buckets |
| Chain replications (A→B→C) | Set up direct replication from A to C |
| Use SSE-C without HTTPS | HTTPS is mandatory for SSE-C |
| Set Object Lock on bucket without versioning | Enable versioning first |
Keywords: unpredictable access, varies over time, don’t know access frequency
Answer: Intelligent-Tiering
Why: Auto-moves objects between tiers, no retrieval fees, small monitoring fee.
Keywords: archive, quarterly access, millisecond retrieval, compliance archive
Answer: Glacier Instant Retrieval
Why: Archive pricing + instant access. Glacier Flexible = hours, not milliseconds.
Keywords: rarely accessed, years of retention, 12+ hour retrieval OK
Answer: Glacier Deep Archive
Why: Cheapest class, 12-48 hour retrieval. Use Standard/Bulk retrieval.
Keywords: encrypt all current files, change encryption, bulk encrypt
Answer: S3 Batch Operations
Why: Lifecycle Rules can’t encrypt. CRR creates copies. Batch Operations modifies in-place.
Keywords: reduce costs, clean up, delete versions older than X days, incomplete multipart
Answer: Lifecycle Expiration Actions
Why: Transition = move to cheaper class. Expiration = delete permanently.
Keywords: audit access, security analysis, who accessed, access attempts
Answer: S3 Access Logs + Amazon Athena
Why: Access Logs capture all requests (including denied). Athena queries logs with SQL.
Keywords: customer-managed keys, keys not stored in AWS, full key control
Answer: SSE-C (if encryption in S3) or Client-Side (if encryption before upload)
Why: SSE-S3/SSE-KMS store keys in AWS. SSE-C/Client-Side = keys never stored in AWS.
Keywords: regulatory compliance, immutable, prevent deletion, WORM, SEC 17a-4
Answer: Object Lock in Compliance mode
Why: Compliance mode = truly immutable. No one (root, admin, AWS) can delete until retention expires.
Keywords: internal policy, admin can override, flexible protection
Answer: Object Lock in Governance mode
Why: Users with s3:BypassGovernanceRetention permission can override. Compliance mode has no override.
Keywords: temporary access, time-limited URL, download link for logged-in users
Answer: Pre-Signed URL
Why: User inherits permissions of URL generator. Expires after set time (max 7 days via CLI).
Keywords: multiple teams, different prefixes, simplify access management
Answer: S3 Access Points
Why: Each Access Point has own policy, simplifies per-team access vs complex bucket policy.
Keywords: redact PII, convert format, resize images, enrich data on-the-fly
Answer: S3 Object Lambda
Why: Lambda transforms during GET request. No data duplication, no extra storage.
Keywords: global users, long-distance upload, slow uploads
Answer: S3 Transfer Acceleration
Why: Uses CloudFront edge locations. Combine with Multi-Part for large files.
Keywords: large files, unstable connection, retry on failure
Answer: Multi-Part Upload
Why: Parallel upload, retry only failed parts. Required for >5GB files.
Keywords: file header, first N bytes, metadata extraction
Answer: Byte-Range Fetch
Why: Request specific byte ranges. Efficient for partial data retrieval.
Keywords: cross-origin, CORS error, browser blocking, different domain
Answer: Configure CORS on the target bucket (the one being requested)
Why: CORS is configured where the data is, not where the request originates.
Keywords: existing objects, current files, replicate everything
Answer: S3 Batch Replication
Why: Normal replication only copies new objects. Batch Replication handles existing.
| Class | Avail. | AZs | Min Duration | Retrieval | Use Case |
|---|---|---|---|---|---|
| Standard | 99.99% | ≥3 | - | Instant | Frequently accessed |
| Intelligent-Tiering | 99.9% | ≥3 | - | Instant | Unknown patterns |
| Standard-IA | 99.9% | ≥3 | 30 days | Instant | Infrequent, rapid access |
| One Zone-IA | 99.5% | 1 | 30 days | Instant | Recreatable data |
| Glacier Instant | 99.9% | ≥3 | 90 days | ms | Once/quarter access |
| Glacier Flexible | 99.99% | ≥3 | 90 days | 1min-12hr | Archive, flexible |
| Glacier Deep Archive | 99.99% | ≥3 | 180 days | 12-48hr | Long-term archive |
| Express One Zone | 99.95% | 1 | - | <10ms | AI/ML, lowest latency |
| Method | Keys Managed By | Keys Stored In AWS? | HTTPS Required? | Quota Limits? |
|---|---|---|---|---|
| SSE-S3 | AWS | ✅ Yes | No | ❌ No |
| SSE-KMS | Customer (via KMS) | ✅ Yes | No | ✅ Yes (API quota) |
| DSSE-KMS | Customer (via KMS) | ✅ Yes | No | ✅ Yes |
| SSE-C | Customer (external) | ❌ No | ✅ Yes (mandatory) | ❌ No |
| Client-Side | Customer (external) | ❌ No | No | ❌ No |
| Mode | Who Can Delete? | Shorten Retention? | Override? | Use Case |
|---|---|---|---|---|
| Compliance | No one | ❌ Never | ❌ Never | Regulatory (SEC, FINRA) |
| Governance | Special permission | ✅ Yes | ✅ With permission | Internal policies |
| Legal Hold | No one while active | N/A | ✅ Remove hold | Litigation, investigations |
| Metric | Limit |
|---|---|
| Requests per prefix (PUT/POST/DELETE) | 3,500/sec |
| Requests per prefix (GET/HEAD) | 5,500/sec |
| Single PUT max size | 5 GB |
| Object max size | 5 TB |
| Multi-Part Upload max parts | 10,000 |
| Multi-Part Upload min part size | 5 MB (except last) |
| Method | Default | Maximum |
|---|---|---|
| S3 Console | 1-720 minutes | 12 hours |
| AWS CLI | 3600 seconds | 604800 seconds (7 days) |
| API/Header | Purpose |
|---|---|
x-amz-server-side-encryption: AES256 | SSE-S3 |
x-amz-server-side-encryption: aws:kms | SSE-KMS |
x-amz-bypass-governance-retention: true | Override Governance mode |
aws:SecureTransport | Condition for HTTPS enforcement |
| Question Contains | → Instant Answer |
|---|---|
| “unknown access pattern” | Intelligent-Tiering |
| “millisecond from archive” | Glacier Instant |
| “cheapest archive” | Glacier Deep Archive |
| “lowest latency” / “single-digit ms” | Express One Zone |
| “recreatable” + “single AZ OK” | One Zone-IA |
| “encrypt existing objects” | S3 Batch Operations |
| “transition to cheaper” | Lifecycle Transition Actions |
| “delete old versions” | Lifecycle Expiration Actions |
| “incomplete multipart” | Lifecycle Expiration Actions |
| “audit access” / “who accessed” | S3 Access Logs + Athena |
| “keys never in AWS” | SSE-C or Client-Side |
| “high throughput + encrypt” | SSE-S3 (not KMS!) |
| “prevent deletion” + “compliance” | Object Lock Compliance |
| “admin can override” | Object Lock Governance |
| “temporary link” | Pre-Signed URL |
| “per-team access” | S3 Access Points |
| “transform before GET” | S3 Object Lambda |
| “read first N bytes” | Byte-Range Fetch |
| “large file upload” | Multi-Part Upload |
| “faster long-distance” | Transfer Acceleration |
| “storage cost analysis” | S3 Storage Lens |
| “lifecycle recommendations” | S3 Analytics |
| “replicate existing” | S3 Batch Replication |
| “CORS error” | Configure CORS on target bucket |
| “cross-account” | Bucket Policy |
| “event on upload” | Event Notifications (SNS/SQS/Lambda) |
| “infinite loop” + logging | Logging bucket ≠ monitored bucket |
| “can’t create bucket” | Name already taken globally |
| “global but regional” | Bucket created in region |
| “overwrite + immediate read” | Always latest (strong consistency) |
| “eventual consistency S3” | ❌ Outdated — S3 is strongly consistent since 2020 |
When stuck between options, eliminate systematically:
□ Is it about ENCRYPTION?
→ Lifecycle Rules = ❌ Can't encrypt
→ Batch Operations = ✅ Can encrypt existing objects
□ Is it about DELETION PROTECTION?
→ Compliance mode = No one can delete/override
→ Governance mode = Admin can override with permission
→ Legal Hold = Indefinite, removable
□ Is it about ACCESS CONTROL?
→ Same account user = IAM Policy
→ EC2/Lambda = IAM Role
→ Cross-account = Bucket Policy
→ Different teams = Access Points
□ Is it about PERFORMANCE?
→ More throughput = Spread across prefixes
→ Large files = Multi-Part Upload
→ Long distance = Transfer Acceleration
→ Partial data = Byte-Range Fetch
□ Is it about STORAGE CLASS?
→ Unknown pattern = Intelligent-Tiering
→ Infrequent = Standard-IA or One Zone-IA
→ Archive (instant) = Glacier Instant
→ Archive (flexible) = Glacier Flexible
→ Archive (cheapest) = Glacier Deep Archive
→ Lowest latency = Express One Zone
□ Is it about AUDITING?
→ Who accessed objects = S3 Access Logs + Athena
→ API calls to S3 = CloudTrail
→ Storage analysis = S3 Storage Lens or Analytics
□ Is it about REPLICATION?
→ New objects = Standard Replication
→ Existing objects = Batch Replication
→ Versioning required = ✅ On both bucketsAWS Snow Family: highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS. Trying to resolve challenges like:
Data migration and Edge computing:
AWS OpsHub — GUI application to manage Snow Family devices (installed on your computer)
AWS OpsHub Management:
┌─────────────────────────────────────────────────────────┐
│ Your Computer │
│ ┌───────────────────────────────────────────────────┐ │
│ │ AWS OpsHub (GUI) │ │
│ │ ┌─────────────┬─────────────┬─────────────────┐ │ │
│ │ │ Unlock & │ Transfer │ Launch EC2 │ │ │
│ │ │ Configure │ Files │ Manage Storage │ │ │
│ │ └─────────────┴─────────────┴─────────────────┘ │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ │ Local connection (USB/Network) │
│ ▼ │
│ ┌───────────────┐ │
│ │ Snow Device │ │
│ │ (Snowcone / │ │
│ │ Snowball Edge)│ │
│ └───────────────┘ │
└─────────────────────────────────────────────────────────┘Rule of thumb: If network transfer takes > 1 week → use Snowball
| Data Size | 100 Mbps | 1 Gbps | 10 Gbps |
|---|---|---|---|
| 10 TB | 12 days | 30 hours | 3 hours |
| 100 TB | 124 days | 12 days | 30 hours |
| 1 PB | 3 years | 124 days | 12 days |
Direct Upload vs Snowball:
Direct: Client ──── www (10Gbit/s) ────▶ S3 Bucket
Snowball: Client ──▶ Snowball ──▶ [ship] ──▶ AWS ──▶ S3 Bucket
(local) (import)Edge Computing = process data where it’s created (before sending to cloud)
| Device | Use Case |
|---|---|
| Snowball Edge Storage Optimized | Large data + some compute |
| Snowball Edge Compute Optimized | Heavy processing (ML, transcoding) |
Use cases: Preprocess data, machine learning at edge, media transcoding
⚠️ Exam trap: “Large data + process while in transit” → Snowball Edge (not Snowcone)
⚠️ Exam trap: Snowball cannot import to Glacier directly
Snowball ──▶ Amazon S3 ──▶ (Lifecycle Policy) ──▶ Amazon GlacierAmazon FSx = Launch 3rd party high-performance file systems on AWS (fully managed)
Lustre Deployment Options:
| Option | Replication | Performance | Use Case |
|---|---|---|---|
| Scratch | ❌ No (data lost if fails) | 6x faster (200 MBps/TiB) | Short-term processing, cost optimized |
| Persistent | ✅ Within same AZ | Standard | Long-term processing, sensitive data |
FSx Lustre Deployment Options:
Scratch File System: Persistent File System:
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ Region │ │ Region │
│ ┌─────────┐ ┌─────────┐ │ │ ┌─────────┐ ┌─────────┐ │
│ │ AZ 1 │ │ AZ 2 │ │ │ │ AZ 1 │ │ AZ 2 │ │
│ │Compute │ │Compute │ │ │ │Compute │ │Compute │ │
│ └────┬────┘ └────┬────┘ │ │ └────┬────┘ └────┬────┘ │
│ └─────┬───────┘ │ │ └─────┬───────┘ │
│ ENI │ │ ENI │
│ │ │ │ │ │
│ ┌────▼────┐ │ │ ┌────▼────┐ │
│ │ FSx │──▶ S3 │ │ │ FSx │──▶ S3 │
│ │(Scratch)│ (optional)│ │ │(Persist)│ (optional)│
│ └─────────┘ │ │ └─────────┘ │
│ (No replication) │ │ (Replicated in AZ) │
└─────────────────────────────┘ └─────────────────────────────┘FSx for NetApp ONTAP / OpenZFS - Compatible Clients:
┌─────────────────────────┐
│ FSx NetApp ONTAP │
│ (NFS, SMB, iSCSI) │
│ ─────────────────────── │
│ FSx OpenZFS │
│ (NFS v3/v4 only) │
└───────────┬─────────────┘
│
┌────────────────────────┼────────────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────────────┐ ┌──────────────┐
│EC2/ECS/EKS │ │VMware/AppStream/ │ │On-premises │
│ │ │WorkSpaces │ │Server │
└─────────────┘ └─────────────────────┘ └──────────────┘
Linux/Win/MacFSx Comparison:
| FSx Type | Protocol | Best For | Key Feature |
|---|---|---|---|
| Windows | SMB, NTFS | Windows workloads | AD integration, Multi-AZ |
| Lustre | POSIX | HPC, ML, Linux | S3 integration, sub-ms latency |
| NetApp ONTAP | NFS, SMB, iSCSI | Multi-OS, NAS migration | Auto-scaling, cloning |
| OpenZFS | NFS | ZFS migration | 1M IOPS, <0.5ms latency, cloning |
FSx Use Case Decision Tree:
| Scenario | Answer |
|---|---|
| Windows app needs shared storage + Active Directory | FSx for Windows |
| HPC cluster needs fast shared storage + read from S3 | FSx for Lustre |
| ML training with large datasets in S3 | FSx for Lustre |
| Migrate existing Windows file server to AWS | FSx for Windows |
| Migrate NetApp/NAS to AWS | FSx for NetApp ONTAP |
| Need NFS + SMB + iSCSI on same file system | FSx for NetApp ONTAP |
| Migrate ZFS-based workloads to AWS | FSx for OpenZFS |
| Need point-in-time cloning for testing | NetApp ONTAP or OpenZFS |
| Short-term compute job, optimize cost | FSx Lustre Scratch |
| Long-term processing, data must survive failure | FSx Lustre Persistent |
⚠️ Exam traps:
Bridge between on-premises and AWS cloud storage
AWS Storage Gateway Overview:
On-Premises AWS Cloud
┌─────────────────────────────────────┐ ┌────────────────────────────────┐
│ │ │ │
│ File Shares ──NFS/SMB──▶ File GW ───┼────┼──▶ S3 (excl. Glacier) ──▶ Glacier
│ (cache) │ │ │
│ │ │ │
│ App Server ──iSCSI────▶ Volume GW ──┼────┼──▶ S3 ──▶ EBS Snapshots │
│ (cache) │ │ │
│ │ │ │
│ Backup App ──iSCSI VTL─▶ Tape GW ───┼────┼──▶ S3 (Tape Library) ──▶ Glacier
│ (cache) │ │ │
└─────────────────────────────────────┘ └────────────────────────────────┘
Encryption in Transit (Internet or Direct Connect)| Gateway Type | Protocol | Backend | Use Case |
|---|---|---|---|
| S3 File Gateway | NFS, SMB | S3 (Standard, IA, One Zone, Intelligent) | Access S3 via file protocols, cached locally |
| FSx File Gateway | SMB | FSx for Windows | Low-latency access to FSx from on-prem |
| Volume Gateway | iSCSI | S3 + EBS snapshots | Block storage backed by S3 |
| Tape Gateway | iSCSI (VTL) | S3 + Glacier | Replace physical tapes with cloud |
S3 File Gateway:
On-Premises AWS Cloud
┌────────────────────┐ ┌─────────────────────────────────┐
│ App Server │ │ S3 Standard / IA / One Zone-IA │
│ │ │ HTTPS │ S3 Intelligent-Tiering │
│ ▼ │ │ │ │
│ S3 File Gateway ───┼──────────┼──────────▶│ │
│ (NFS or SMB) │ │ ▼ (Lifecycle Policy) │
│ (local cache) │ │ S3 Glacier │
└────────────────────┘ └─────────────────────────────────┘Volume Gateway:
On-Premises AWS Cloud
┌────────────────────┐ ┌─────────────────────────────────┐
│ App Server │ HTTPS │ │
│ │ │ │ S3 Bucket │
│ ▼ iSCSI │ │ │ │
│ Volume Gateway ────┼──────────┼────────▶│ │
│ (local cache) │ │ ▼ │
└────────────────────┘ │ EBS Snapshots │
└─────────────────────────────────┘Tape Gateway:
On-Premises AWS Cloud
┌────────────────────────┐ ┌─────────────────────────────────┐
│ Backup Server │ │ │
│ │ iSCSI │HTTPS │ Virtual Tapes ──▶ Archived Tapes
│ ▼ │ │ (S3) (Glacier) │
│ ┌──────────┬─────────┐ │ │ │
│ │Media │Tape │ │ │ │
│ │Changer │Drive │─┼──────┼──────────────────────────────▶ │
│ └──────────┴─────────┘ │ │ │
│ Tape Gateway │ │ │
└────────────────────────┘ └─────────────────────────────────┘Volume Gateway Modes:
⚠️ Exam traps:
⚠️ Exam trap: TLS is NOT a supported protocol
AWS Transfer Family:
MS Active Directory / LDAP
│ authenticate
▼
Users ──▶ Route 53 ──▶ ┌─────────────────────┐ ┌─────────────┐
(FTP (optional) │ Transfer for SFTP │ │ │
client) │ Transfer for FTPS │──────▶ Amazon S3 │
│ Transfer for FTP │ │ │
│ (VPC only) │ │ Amazon EFS │
└─────────────────────┘ └─────────────┘
│
IAM RoleDataSync: On-Premises to AWS
On-Premises AWS Region
┌────────────────────────┐ ┌─────────────────────────────────┐
│ │ │ AWS Storage Resources │
│ NFS/SMB Server │ TLS │ ┌─────────┬─────────┬────────┐ │
│ │ │ │ │S3 │S3 IA │S3 │ │
│ ▼ NFS/SMB │ │ │Standard │ │One Zone│ │
│ DataSync Agent ────────┼──────────┼─▶├─────────┼─────────┼────────┤ │
│ │ │ │S3 │S3 │S3 Deep │ │
│ (or Snowcone with │ │ │Intell. │Glacier │Archive │ │
│ agent pre-installed) │ │ ├─────────┴─────────┴────────┤ │
└────────────────────────┘ │ │ EFS │ FSx │ │
│ └───────────┴────────────────┘ │
└─────────────────────────────────┘DataSync: AWS to AWS (no agent needed)
┌─────────────┐ ┌─────────────┐
│ Amazon S3 │ │ Amazon S3 │
├─────────────┤ ┌──────────┐ ├─────────────┤
│ Amazon EFS │◀───────▶│ DataSync │◀───────▶│ Amazon EFS │
├─────────────┤ └──────────┘ ├─────────────┤
│ Amazon FSx │ (copy data + metadata) │ Amazon FSx │
└─────────────┘ └─────────────┘⚠️ Exam traps:
| Aspect | DataSync | Storage Gateway |
|---|---|---|
| Purpose | One-time or scheduled migration/sync | Ongoing hybrid access (bridge) |
| Direction | On-prem → AWS, AWS → AWS | On-prem ↔ AWS (bidirectional access) |
| Use case | “Move data to cloud” | “Extend on-prem storage to cloud” |
| Agent | Yes (on-prem), No (AWS-to-AWS) | VM appliance (always) |
| Caching | No local cache | Yes, local cache for low latency |
| Protocol | NFS, SMB, HDFS, S3 API | NFS, SMB, iSCSI |
⚠️ Exam trap decision:
AWS Storage Cloud Native Options:
┌─────────────────┬─────────────────┬─────────────────┐
│ Block │ File │ Object │
├─────────────────┼─────────────────┼─────────────────┤
│ Amazon EBS │ Amazon EFS │ Amazon S3 │
│ EC2 Instance │ Amazon FSx │ Amazon Glacier │
│ Store │ │ │
└─────────────────┴─────────────────┴─────────────────┘| Service | Type | Use Case |
|---|---|---|
| S3 | Object | General object storage |
| S3 Glacier | Object | Archival |
| EBS | Block | Single EC2 instance storage |
| Instance Store | Block | Ephemeral, high IOPS |
| EFS | File (NFS) | Linux shared file system |
| FSx Windows | File (SMB) | Windows shared file system |
| FSx Lustre | File (POSIX) | HPC, ML, Linux |
| FSx NetApp ONTAP | File (multi) | Multi-OS, NAS migration |
| FSx OpenZFS | File (NFS) | ZFS migration |
| Storage Gateway | Hybrid | On-prem ↔ AWS bridge |
| Transfer Family | Hybrid | FTP/SFTP to S3/EFS |
| DataSync | Migration | Scheduled sync to AWS |
| Snow Family | Migration | Physical data transfer |
| Scenario | Answer |
|---|---|
| Large data (>1 week to transfer), limited bandwidth | Snowball Edge |
| Large data + need to process at edge | Snowball Edge Compute Optimized |
| Small data + limited connectivity + edge compute | Snowcone |
| One-time migration from on-prem NFS/SMB to S3 | DataSync (with agent) |
| Scheduled/recurring sync from on-prem to AWS | DataSync |
| Migrate S3 → EFS or S3 → FSx | DataSync (no agent) |
| On-prem apps need ongoing NFS/SMB access to S3 | S3 File Gateway |
| On-prem apps need low-latency access to FSx Windows | FSx File Gateway |
| On-prem apps need iSCSI block storage backed by S3 | Volume Gateway |
| Replace physical tape backup with cloud | Tape Gateway |
| External users upload via FTP/SFTP to S3 | Transfer Family |
| Import data to Glacier | Snowball → S3 → Lifecycle Policy |
⚠️ Key differentiators:
AWS OpsHub is a software to manage Snow Family Devices.
The fundamental question: How long to transfer over network?
1 week → Consider Snowball (physical transfer)
100 TB at 1 Gbps = 12 days. Snowball wins.
Two fundamentally different needs:
“Move to cloud” = migration. “Extend to cloud” = hybrid access.
What protocol do your applications use?
| Protocol | AWS Service |
|---|---|
| NFS/SMB (file) | Storage Gateway, DataSync, EFS, FSx |
| iSCSI (block) | Volume Gateway, Tape Gateway |
| FTP/SFTP/FTPS | Transfer Family |
| S3 API (object) | Direct S3, DataSync |
FSx is NOT a generic file system — it’s specific file system software:
Snowball cannot import directly to Glacier.
This is a common exam trap.
Snowball Edge isn’t just for transfer — it’s for computing at the edge:
Storage Gateway provides low-latency local access with cloud backing:
DataSync keeps file permissions and metadata intact:
Network quality?
│
┌─────────────┴─────────────┐
▼ ▼
Good/Adequate Limited/Bad
(< 1 week) (> 1 week)
│ │
▼ ▼
DataSync / Direct Snow Family
│
┌─────────┴─────────┐
▼ ▼
Small data Large data
(< 14 TB) (up to 80 TB)
│ │
▼ ▼
Snowcone Snowball Edge What's the need?
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
One-time Scheduled Ongoing
Migration Sync Access
│ │ │
▼ ▼ ▼
DataSync DataSync Storage Gateway
Snowball │
┌─────────────┴─────────────┐
▼ ▼
File access Block storage
(NFS/SMB) (iSCSI)
│ │
▼ ▼
S3 File Gateway Volume Gateway
FSx File Gateway Tape Gateway| If question mentions… | Answer is… |
|---|---|
| “> 1 week transfer” / “limited bandwidth” | Snowball Edge |
| “limited network + small data” | Snowcone |
| “process data at edge” / “edge computing” | Snowball Edge Compute Optimized |
| “migrate to S3/EFS/FSx” (one-time) | DataSync |
| “scheduled sync” / “weekly backup to S3” | DataSync |
| “on-prem NFS access to S3” (ongoing) | S3 File Gateway |
| “on-prem access to FSx Windows” | FSx File Gateway |
| “on-prem iSCSI block storage” | Volume Gateway |
| “replace tape backup” | Tape Gateway |
| “FTP/SFTP access to S3” | Transfer Family |
| “Windows file share + AD” | FSx for Windows |
| “HPC / ML + Linux + S3” | FSx for Lustre |
| “multi-protocol (NFS + SMB + iSCSI)” | FSx for NetApp ONTAP |
| “migrate ZFS workloads” | FSx for OpenZFS |
| “import to Glacier” | Snowball → S3 → Lifecycle |
| “short-term HPC, cost optimized” | FSx Lustre Scratch |
| “data must persist + HPC” | FSx Lustre Persistent |
| “point-in-time cloning” | FSx NetApp ONTAP or OpenZFS |
| Statement | Why It’s Wrong |
|---|---|
| Snowball imports to Glacier directly | Must go to S3 first, then Lifecycle to Glacier |
| DataSync for ongoing hybrid access | DataSync = migration/sync, not ongoing access |
| Storage Gateway for one-time migration | Storage Gateway = ongoing access, not migration tool |
| Transfer Family for internal apps | Transfer Family = FTP for external users |
| TLS as Transfer Family protocol | TLS is encryption, not a protocol — use SFTP/FTPS |
| FSx Lustre for Windows apps | Lustre = Linux/POSIX only |
| FSx for Windows without AD | Windows File Server integrates with AD |
| OpenZFS for SMB access | OpenZFS = NFS only |
| DataSync to EBS | EBS not supported — only S3, EFS, FSx |
| Snowcone for 50 TB | Snowcone max = 14 TB — use Snowball Edge |
⚠️ Exam trap — DataSync over Direct Connect (NFS → EFS):
| Cannot… | Instead… |
|---|---|
| Import Snowball to Glacier directly | Snowball → S3 → Lifecycle → Glacier |
| Use DataSync to EBS | Use EBS snapshots or block-level replication |
| Use Transfer Family with TLS protocol | Use SFTP (SSH-based) or FTPS (FTP over TLS) |
| Access FSx Lustre from Windows | Use FSx for Windows or NetApp ONTAP |
| Use Snowcone for >14 TB | Use Snowball Edge (up to 80 TB) |
| Run EC2 on Snowcone | Limited compute — use Snowball Edge Compute |
| Use FTP without VPC (Transfer Family) | FTP = VPC only; use SFTP/FTPS for public |
Keywords: petabytes, limited bandwidth, weeks to transfer, offline, remote location
Answer: Snowball Edge
Why: Physical transfer bypasses network limitations. > 1 week transfer → Snowball.
Keywords: edge computing, process before upload, ML at edge, trucks, ships, mining
Answer: Snowball Edge Compute Optimized
Why: Run EC2/Lambda locally, process data, then ship to AWS.
Keywords: small dataset, remote, portable, <14 TB
Answer: Snowcone
Why: Smallest Snow device (8-14 TB), portable, has DataSync agent pre-installed.
Keywords: migrate, one-time transfer, move to cloud, NFS to S3
Answer: DataSync (with agent)
Why: DataSync = migration tool. Preserves metadata. Scheduled or one-time.
Keywords: weekly sync, daily backup, recurring, scheduled
Answer: DataSync
Why: DataSync supports hourly/daily/weekly schedules.
Keywords: hybrid, continuous access, extend storage, NFS/SMB to S3
Answer: S3 File Gateway
Why: Storage Gateway = ongoing hybrid access with local cache.
Keywords: tape, VTL, virtual tape library, backup to cloud
Answer: Tape Gateway
Why: Presents virtual tapes via iSCSI, stores in S3/Glacier.
Keywords: FTP, SFTP, file transfer, external partners
Answer: AWS Transfer Family
Why: Managed FTP/SFTP/FTPS service to S3 or EFS.
Keywords: Windows, SMB, NTFS, Active Directory, DFS
Answer: FSx for Windows File Server
Why: Fully managed Windows file system with AD integration.
Keywords: HPC, high-performance computing, ML training, Linux, Lustre
Answer: FSx for Lustre
Why: Parallel file system, S3 integration, sub-ms latency, 100s GB/s.
Keywords: S3 integration, lazy load, HPC reads from S3
Answer: FSx for Lustre
Why: Can mount S3 as file system, lazy-load data on access.
Keywords: NFS and SMB, multi-OS, migrate NAS
Answer: FSx for NetApp ONTAP
Why: Only FSx that supports all three protocols.
Keywords: ZFS, OpenZFS, migrate ZFS
Answer: FSx for OpenZFS
Why: Managed OpenZFS, NFS protocol, snapshots, cloning.
Keywords: Snowball to Glacier, archive imported data
Answer: Snowball → S3 → S3 Lifecycle Policy → Glacier
Why: Snowball cannot import directly to Glacier.
Keywords: temporary processing, cost optimized, short-term
Answer: FSx for Lustre (Scratch)
Why: Scratch = no replication, 6x faster, cheaper. Data lost if fails.
Keywords: persistent, data durability, long-term HPC
Answer: FSx for Lustre (Persistent)
Why: Replicated within AZ, data survives failures.
| Device | Storage | Compute | Use Case |
|---|---|---|---|
| Snowcone | 8-14 TB | 2 vCPU, 4 GB | Small data, portable, DataSync agent |
| Snowball Edge Storage | 80 TB | 40 vCPU, 80 GB | Large data + some compute |
| Snowball Edge Compute | 42-80 TB | 104 vCPU, 416 GB | Heavy processing at edge |
| - | Discontinued |
| FSx Type | Protocol | OS | Best For |
|---|---|---|---|
| Windows | SMB, NTFS | Windows | Windows apps, AD integration |
| Lustre | POSIX | Linux | HPC, ML, S3 integration |
| NetApp ONTAP | NFS, SMB, iSCSI | Multi-OS | NAS migration, multi-protocol |
| OpenZFS | NFS | Linux/Unix | ZFS migration, cloning |
| Gateway Type | Protocol | Backend | Use Case |
|---|---|---|---|
| S3 File Gateway | NFS, SMB | S3 | File access to S3 |
| FSx File Gateway | SMB | FSx Windows | Low-latency FSx access |
| Volume Gateway | iSCSI | S3 + EBS | Block storage to S3 |
| Tape Gateway | iSCSI (VTL) | S3 + Glacier | Replace physical tapes |
| Aspect | DataSync | Storage Gateway |
|---|---|---|
| Purpose | Migration / Sync | Ongoing hybrid access |
| Use case | “Move to cloud” | “Extend to cloud” |
| Caching | No | Yes (low latency) |
| Direction | One-way or scheduled | Bidirectional access |
| Agent | Yes (on-prem) | VM appliance |
| Protocol | Encryption | Access |
|---|---|---|
| SFTP | SSH-based | Public or VPC |
| FTPS | TLS-based | Public or VPC |
| FTP | None | VPC only |
⚠️ TLS is NOT a protocol — it’s encryption layer used BY FTPS
| Scenario | Service |
|---|---|
| > 1 week transfer time | Snowball Edge |
| < 14 TB + limited network | Snowcone |
| One-time NFS/SMB → S3 migration | DataSync |
| Scheduled sync to S3/EFS/FSx | DataSync |
| S3 → EFS or S3 → FSx migration | DataSync (no agent) |
| Ongoing NFS/SMB access to S3 | S3 File Gateway |
| FTP/SFTP uploads to S3 | Transfer Family |
| Replace tape backup | Tape Gateway |
| iSCSI block storage to S3 | Volume Gateway |
| Item | Value |
|---|---|
| Snowcone storage | 8 TB HDD / 14 TB SSD |
| Snowball Edge Storage | 80 TB |
| Snowball Edge Compute | 42 TB HDD / 28 TB NVMe |
| DataSync throughput | Up to 10 Gbps per agent |
| FSx Lustre throughput | 100s GB/s |
| FSx OpenZFS IOPS | 1,000,000 IOPS |
| Volume Gateway cache | Local + S3 |
| Question Contains | → Instant Answer |
|---|---|
| “> 1 week transfer” / “bad network” | Snowball Edge |
| “small data + remote” | Snowcone |
| “edge computing” / “process at edge” | Snowball Edge Compute |
| “migrate NFS/SMB to S3” | DataSync |
| “scheduled sync to AWS” | DataSync |
| “S3 → EFS” or “S3 → FSx” | DataSync (no agent) |
| “on-prem NFS access to S3” (ongoing) | S3 File Gateway |
| “on-prem access to FSx Windows” | FSx File Gateway |
| “iSCSI block storage to cloud” | Volume Gateway |
| “replace tape backup” | Tape Gateway |
| “FTP/SFTP to S3” | Transfer Family |
| “TLS protocol” | ❌ Wrong — use SFTP/FTPS |
| “Windows file share + AD” | FSx for Windows |
| “HPC + Linux + S3” | FSx for Lustre |
| “multi-protocol (NFS+SMB+iSCSI)” | FSx for NetApp ONTAP |
| “migrate ZFS” | FSx for OpenZFS |
| “Snowball → Glacier” | S3 first → Lifecycle |
| “short-term HPC, cheap” | FSx Lustre Scratch |
| “HPC data must persist” | FSx Lustre Persistent |
| “point-in-time cloning” | FSx NetApp ONTAP or OpenZFS |
| “Snowmobile” | Discontinued — use multiple Snowball |
When stuck between options, eliminate systematically:
□ Is network transfer > 1 week?
→ Yes = Snow Family (Snowball/Snowcone)
→ No = DataSync or direct transfer
□ Is it MIGRATION or ONGOING ACCESS?
→ Migration = DataSync, Snowball
→ Ongoing = Storage Gateway
□ What PROTOCOL do apps use?
→ NFS/SMB (file) = File Gateway, DataSync, FSx
→ iSCSI (block) = Volume Gateway, Tape Gateway
→ FTP/SFTP = Transfer Family
□ Is it WINDOWS or LINUX?
→ Windows + AD = FSx for Windows
→ Linux + HPC = FSx for Lustre
→ Both = FSx for NetApp ONTAP
□ Do they need EDGE COMPUTING?
→ Yes + small = Snowcone (limited)
→ Yes + heavy = Snowball Edge Compute
□ Is data going to GLACIER?
→ Via Snowball = S3 first → Lifecycle → Glacier
→ Direct = S3 Lifecycle Policy
□ Is it SCHEDULED SYNC?
→ Yes = DataSync (hourly/daily/weekly)
→ No = One-time DataSync or Snowball
□ Do they need LOCAL CACHE?
→ Yes = Storage Gateway
→ No = DataSync or direct
□ Is it for EXTERNAL USERS?
→ FTP/SFTP = Transfer Family
→ Internal apps = Storage Gateway ┌─────┐ ┌─────┐ ┌─────┐
│User │ │User │ │User │
└──┬──┘ └──┬──┘ └──┬──┘
│ │ │
└───────┼───────┘
▼
┌─────────────────────┐
│ Application │
└──────────┬──────────┘
│ Read/Write
▼
┌────────────┐
│ Amazon RDS │
└──────┬─────┘
│
▼
<────────── Storage ──────────>RDS (Relational Database Service) is a distributed relational database service (SQL).
| Supported Engines |
|---|
| PostgreSQL, MySQL, MariaDB, Oracle, MS SQL Server, IBM DB2, Aurora |
┌───────────────────┐
│ Application │
└────────┬─┬────────┘
writes ↓ ↑ reads
┌────┴────┐
│ M │ ← Master (writes + reads)
└────┬────┘
ASYNC │ ASYNC
replication ←────┴────→ replication
┌─────┴─────┐ ┌─────┴─────┐
│ R │ │ R │ ← Read Replicas
└─────┬─────┘ └─────┬─────┘
↑ reads ↑ readsRead Replicas: Up to 15 replicas, ASYNC replication (eventually consistent), can be cross-AZ/cross-Region.
⚠️ Exam trap: ASYNC = eventual consistency = replication lag
Read Replica Network Cost:
┌─────────────────────────────┐ ┌────────────────────────────┐
│ Same Region / Different AZ │ │ Cross-Region │
│ us-east-1a us-east-1b │ │ us-east-1a eu-west-1b │
│ ┌───┐ ASYNC ┌───┐ │ vs │ ┌───┐ ASYNC ┌───┐ │
│ │ M │ ───────→ │ R │ │ │ │ M │ ───────→ │ R │ │
│ └───┘ └───┘ │ │ └───┘ └───┘ │
│ FREE (same region) │ │ $$$ (cross-region) │
└─────────────────────────────┘ └────────────────────────────┘⚠️ Exam trap: Same region replication = FREE, Cross-region = costs $$$
RDS Cross-Region DR Strategy:
RDS Multi-AZ (Disaster Recovery):
┌───────────────────┐
│ Application │
└────────┬─┬────────┘
writes ↓ ↑ reads
┌─────────────────────────────────────┐
│ One DNS name – automatic failover │
└──────────────────┬──────────────────┘
│
┌─────────────┴─────────────┐
▼ │
┌─────────┐ SYNC ┌────┴────┐
│ M │ ──────────────→ │ S │
└─────────┘ replication └─────────┘
Master (AZ A) Standby (AZ B)⚠️ Exam trap: Multi-AZ = High Availability (failover), Read Replicas = Scalability (read performance)
READ REPLICA (ASYNC) MULTI-AZ (SYNC)
┌───┐ ┌───┐ ┌───┐ ┌───┐
│ M │ ──────→ │ R │ │ M │ ──────→ │ S │
└───┘ async └───┘ └───┘ sync └───┘
(lag OK) (no lag!)
"eventually consistent" "always consistent"Single-AZ → Multi-AZ Migration (zero downtime):
┌─────────┐ SYNC replication ┌─────────┐
│ M │ ──────────────────→ │ S │
└────┬────┘ └─────────┘
│ Standby DB
↓ snapshot
┌─────────-┐
│ DB |
| snapshot │ ← restore to new AZ
└─────────-┘Use Case: Reporting without impacting production
┌────────────────┐ ┌────────────────┐
│ Production │ │ Reporting │
│ Application │ │ Application │
└─────-┬─┬───────┘ └───────┬────────┘
↓ ↑ ↑ reads
writes/reads │
│ │
┌────┴────┐ ASYNC replication ┌----┴────┐
│ M │ ───────────────────→ │ R │
└─────────┘ └─────────┘
RDS Master Read ReplicaRDS Storage Auto Scaling:
Why RDS over EC2-hosted DB?
| RDS Manages For You | You Still Control |
|---|---|
| OS patching | Database schema |
| Automated backups (Point in Time Restore) | Application queries |
| Monitoring dashboards | Security groups |
| Hardware provisioning | Parameter groups |
| Read replicas & Multi-AZ setup | |
| Storage scaling (EBS-backed) |
⚠️ Exam trap: You can’t SSH into RDS instances (except RDS Custom for Oracle/SQL Server).
RDS Custom (Oracle & MS SQL Server only):
┌───────┐
│ User │
└───┬───┘
apply │ SSH / SSM
customs │
▼
┌───────────────┐
│ EC2 Instance │
├───────────────┤
│ Amazon RDS │ Automation Mode: DISABLED
└───────────────┘| RDS | RDS Custom |
|---|---|
| AWS manages OS + DB | Full admin access to OS + DB |
| No SSH | SSH / SSM Session Manager |
| No custom patches | Install patches, configure settings |
⚠️ Disable Automation Mode before customizing. Take snapshot first!
⚠️ Exam trap: “Full customization of Oracle/SQL Server” + “benefit from AWS services” = RDS Custom
Amazon Aurora is AWS cloud optimized (5x faster than MySQL, 3x faster than PostgreSQL on RDS) an enterprise-class relational database, proprietary technology from AWS (not open source). Automatically growing storage. Costs more than RDS on 20%, but it’s more efficient, Amazon Aurora helps to reduce your database costs by reducing unnecessary input/output (I/O) operations, while ensuring that your database resources remain reliable and available.
Amazon Aurora replicates six copies of your data across three Availability Zones and continuously backs up your data to Amazon S3.
| Feature | Details |
|---|---|
| Engines | PostgreSQL, MySQL (compatible drivers) |
| Performance | 5x MySQL, 3x PostgreSQL on RDS |
| Storage | Auto-grows 10GB → 128TB |
| Replicas | Up to 15, <10ms replica lag |
| Failover | Instantaneous (HA native) |
| Cost | 20% more than RDS, but more efficient |
| Durability | 6 copies across 3 AZs, continuous backup to S3 |
⚠️ Exam trap: “OLTP” + “auto-scaling storage” + “maximum replicas” = Aurora
Aurora High Availability:
AZ 1 AZ 2 AZ 3
┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐
│ M │ │ R │ │ R │ │ R │ │ R │ │ R │
└─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘
↓W ↑R ↑R ↑R ↑R ↑R
══════════════════════════════════════════
Shared Storage Volume (100s of volumes)
Replication + Self Healing + Auto Expanding
══════════════════════════════════════════Aurora Quorum (failure tolerance):
| Scenario | Writes | Reads |
|---|---|---|
| 1 AZ down (2 copies lost) | ✅ Works (4 remaining) | ✅ Works |
| 3 copies lost | ✅ Works | ✅ Works |
| 4+ copies lost | ❌ Write outage | ❌ Read outage |
Aurora DB Cluster Endpoints:
┌──────────┐
│ Client │
└────┬─────┘
┌─────────────┴──────--───────┐
▼ ▼
┌─────────────────────┐ ┌────────────────────────────┐
│ Writer Endpoint │ │ Reader Endpoint │
│ (points to master) │ │ (load balances to replicas)│
└──────────┬──────────┘ └─────────────┬──────────────┘
│ ┌────────┼────────┐
▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│ M │←──────────│ R │ │ R │ │ R │ ← Auto Scaling
└───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘
↓W ↑R ↑R ↑R
════════════════════════════════════════════════════
Shared Storage (10GB → 128TB auto-expanding)
════════════════════════════════════════════════════⚠️ Exam trap — “Separate reads from writes” in Aurora:
Aurora Features:
Aurora Replicas Auto Scaling:
┌──────────┐
│ Client │
└────┬─────┘
┌───────────────┴───────────────┐
▼ ▼ Many Requests
┌─────────────────────┐ ┌────────────────────────────┐
│ Writer Endpoint │ │ Reader Endpoint │
└──────────┬──────────┘ └─────────────┬──────────────┘
│ ┌────────┼────────┐
▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│ M │ │ R │ │ R │ │ R │ ← Added by
└───┬───┘ CPU↑ CPU↑ └───┬───┘ └───┬───┘ └───┬───┘ Auto Scaling
↓W ↑R ↑R ↑R
════════════════════════════════════════════════════════════
Shared Storage (10GB → 128TB auto-expanding)
════════════════════════════════════════════════════════════Aurora Custom Endpoints:
┌──────────┐
│ Client │
└────┬─────┘
┌───────────────--──────┼──────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ Writer Endpoint │ │ Reader Endpoint │ │ Custom Endpoint │
└────────┬────────┘ └────────┬────────┘ │(Analytical Query)│
│ │ └────────┬─────────┘
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│ M │ │ R │ │ R │ │ R │ │ R │
└───┬───┘ └───────┘ └───────┘ └───────┘ └───────┘
↓W db.r3.large (small) db.r5.2xlarge (large)
════════════════════════════════════════════════════════════════
Shared Storage Volume
════════════════════════════════════════════════════════════════Aurora Serverless:
┌──────────┐
│ Client │
└────┬─────┘
│
┌─────────────────────────────┐
│ Proxy Fleet │
│ (managed by Aurora) │
└──────────────┬──────────────┘
┌────┬────┼────┬────┐
▼ ▼ ▼ ▼ ▼
┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ← Auto-scales
│DB│ │DB│ │DB│ │DB│ │DB│ based on load
└──┘ └──┘ └──┘ └──┘ └──┘
════════════════════════════════════════════
Shared Storage Volume
════════════════════════════════════════════⚠️ Exam trap: “Dev/test environment” + “unused most of time” + “minimize costs” = Aurora Serverless
Aurora Global Database:
┌──────────────────────────────────────────┐
│ us-east-1 (PRIMARY REGION) │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Application │ │ Aurora │ │
│ │ Read/Write │ ────→ │ Primary │ │
│ └─────────────┘ └──────┬──────┘ │
└───────────────────────────────┼──────────┘
│ replication
│ (<1 second)
┌───────────────────────────────┼──────────┐
│ eu-west-1 (SECONDARY REGION) │
│ ┌─────────────┐ ┌──────┴──────┐ │
│ │ Application │ │ Aurora │ │
│ │ Read Only │ ←──── │ Secondary │ │
│ └─────────────┘ └─────────────┘ │
└──────────────────────────────────────────┘| Feature | Details |
|---|---|
| Primary Region | 1 (read/write) |
| Secondary Regions | Up to 5 (read-only) |
| Replicas per Region | Up to 16 |
| Replication Lag | <1 second |
| DR Promotion RTO | <1 minute |
⚠️ Exam trap: “Cross-region Disaster Recovery” or “replica in another region” = Aurora Global Database
Aurora Machine Learning:
┌─────────────┐
│ Application │
└──────┬──────┘
SQL query │ query results
(recommendations?) │ (red shirt, blue...)
▼
┌─────────────┐
│ Aurora │
└──────┬──────┘
data │ predictions
(user profile, │ (red shirt,
shopping...) │ blue pants...)
┌────────┴────────┐
▼ ▼
┌────────────┐ ┌─────────────┐
│ SageMaker │ │ Comprehend │
│ (any ML) │ │ (sentiment) │
└────────────┘ └─────────────┘Babelfish for Aurora PostgreSQL:
┌─────────────────┐ ┌─────────────────┐
│ Application │ │ Application │
│ SQL Server │ │ PostgreSQL │
│ Client Driver │ │ Driver │
└────────┬────────┘ └────────┬────────┘
│ T-SQL │ PL/pgSQL
│ │
│ ┌────────────────────┐ │
│ │ Aurora PostgreSQL │ │
│ ├─────────┬──────────┤ │
└───→│Babelfish│PostgreSQL│←───┘
└─────────┴──────────┘
↑
migrate
│
┌───────────────┐
│ MS SQL │
│ Server │
└───────────────┘RDS & Aurora Backups:
| RDS | Aurora | |
|---|---|---|
| Automated Backups | 1-35 days (0 = disable retention) | 1-35 days (cannot disable) |
| Transaction Logs | Every 5 min | Continuous |
| Point-in-Time Recovery | Up to 5 min ago | Within retention window |
| Manual Snapshots (On-Demand) | Unlimited retention | Unlimited retention |
⚠️ Exam traps:
RDS & Aurora Restore Options:
Aurora Database Cloning:
CLONING (instant) SNAPSHOT/RESTORE (slow)
┌─────────────┐ ┌─────────────┐
│ Production │ │ Production │
└──────┬──────┘ └──────┬──────┘
│ shared storage │ snapshot
▼ (no copy!) ▼ (copy all!)
┌─────────────┐ ┌─────────────┐
│ Clone │ │ New DB │
└─────────────┘ └─────────────┘
Only new writes Full duplicate
use extra storage storage cost⚠️ Exam trap: “Need production data ASAP” + “read/write tests” = Aurora Cloning (instant)
RDS & Aurora Security:
| Security Layer | Details |
|---|---|
| At-rest encryption | AWS KMS, must enable at launch time |
| In-flight encryption | TLS by default |
| IAM Authentication | IAM roles instead of username/password |
| Security Groups | Control network access |
| Audit Logs | Send to CloudWatch Logs |
⚠️ Exam traps:
⚠️ Exam trap - “End-to-end security for data-in-transit to RDS”:
Amazon RDS Proxy:
┌─────────────────────────────────────────────────┐
│ VPC │
│ ┌───────────────────────────────────────────┐ │
│ │ Lambda functions │ │
│ │ λ λ λ λ λ ... │ │
│ └───────────────────┬───────────────────────┘ │
│ │ IAM Authentication │
│ ┌───────────────────┼───────────────────────┐ │
│ │ Private subnet │ │
│ │ ▼ │ │
│ │ ┌─────────────┐ │ │
│ │ │ RDS Proxy │ ← Connection │ │
│ │ └──────┬──────┘ Pooling │ │
│ │ ▼ │ │
│ │ ┌─────────────┐ │ │
│ │ │ RDS / Aurora│ │ │
│ │ └─────────────┘ │ │
│ └───────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘| Feature | Details |
|---|---|
| Connection Pooling | Reduces DB stress (CPU, RAM, connections) |
| Failover | Reduces RDS/Aurora failover by 66% |
| Supports | RDS (MySQL, PostgreSQL, MariaDB, MS SQL), Aurora |
| Security | IAM Auth, credentials in Secrets Manager |
| Access | Never publicly accessible (VPC only) |
⚠️ Exam trap: “Many EC2s” + “slow reconnection after failover” = RDS Proxy
Two Different Ways to Connect Lambda with RDS/Aurora:
| Aspect | RDS Event Notifications | Invoke Lambda from RDS/Aurora |
|---|---|---|
| Setup | AWS Console (RDS settings) | Inside the database (SQL) |
| Access to DB Data | ❌ No (metadata only) | ✅ Yes (full data access) |
| Trigger Source | DB instance events | Data changes (triggers) |
| Use Case | DB state changes (failover, snapshot) | React to data (new row, update) |
| Engines | All RDS engines | Aurora MySQL, Aurora PostgreSQL |
RDS Event Notifications:
RDS Event Notifications Flow:
RDS Instance ──► RDS Event ──► SNS Topic ──► Lambda
(state change) Subscription (no DB data)Invoke Lambda from RDS/Aurora:
Invoke Lambda from Aurora:
App ──► Aurora ──► Trigger/Stored Proc ──► Lambda ──► External Service
(data) (calls Lambda) (has data) (notifications, etc)⚠️ Exam trap: “React to DB failover/snapshot events” → RDS Event Notifications (via SNS). ⚠️ Exam trap: “Process data when inserted/updated” → Invoke Lambda from Aurora (configured in DB).
Amazon ElastiCache managed Redis or Memcached in-memory databases with high performance and low latency.
⚠️ Exam trap: Using ElastiCache requires heavy application code changes
ElastiCache - DB Cache Pattern:
┌───────────────────┐
│ ElastiCache │
Cache hit │ │
←────────────│ ┌───────────┐ │
─────────────│──→│ Cache │ │
│ └───────────┘ │
┌─────────────┐ └─────────┬─────────┘
│ Application │ │ Cache miss
└──────┬──────┘ │
│ ▼
│ Read from DB ┌─────────────┐
└────────────────→│ Amazon RDS │
←─────────────────└─────────────┘
│
└──→ Write to cacheElastiCache - User Session Store:
┌──────┐
│ User │
└──┬───┘
│
┌─────┴─────┬────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ App │ │ App │ │ App │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
│ Write │ Retrieve │
│ session │ session │
│ │ │
└───────────┴────────────┘
│
▼
┌─────────────┐
│ ElastiCache │
└─────────────┘⚠️ Exam trap: “Users keep logging out” + ALB + Auto Scaling = ElastiCache for sessions
ElastiCache - Redis vs Memcached:
| Feature | Redis | Memcached |
|---|---|---|
| High Availability | Multi-AZ with Auto-Failover | ❌ No HA |
| Read Replicas | ✅ Yes (scale reads) | ❌ No |
| Persistence | ✅ AOF (durable) | ❌ Non-persistent |
| Backup/Restore | ✅ Yes | Serverless only |
| Data Structures | Sets, Sorted Sets | Simple key-value |
| Architecture | Replication | Sharding (multi-node) |
| Threading | Single-threaded | Multi-threaded |
REDIS (HA + Durability) MEMCACHED (Sharding)
┌───┐ Replication ┌───┐ ┌───┐ + ┌───┐
│ R │ ────────────→ │ R │ │ M │ shards │ M │
└───┘ └───┘ └───┘ └───┘⚠️ Exam trap:
ElastiCache - Security:
┌───────────────────────┐
│ EC2 Security Group │
│ ┌─────┐ │
│ │ EC2 │ Client │
│ └──┬──┘ │
└──────────┼────────────┘
│ SSL encryption
│ Redis AUTH
▼
┌───────────────────────┐
│ Redis Security Group │
│ ┌─────┐ │
│ │Redis│ │
│ └─────┘ │
└───────────────────────┘| Engine | Authentication | Notes |
|---|---|---|
| Redis | IAM Authentication | For Redis only |
| Redis | Redis AUTH | Password/token at cluster creation |
| Redis | SSL/TLS | In-flight encryption |
| Memcached | SASL-based | Advanced auth |
⚠️ Exam trap:
ElastiCache - Caching Patterns:
LAZY LOADING WRITE THROUGH
┌─────────┐ ┌─────────┐
│ App │ │ App │
└────┬────┘ └────┬────┘
│ 1. Cache hit? ←───┐ │
▼ │ │ 1. Write to DB
┌─────────┐ ┌────┴────┐ ┌────┴────┐
│ Cache │ │ Cache │ │ RDS │
└────┬────┘ └─────────┘ └────┬────┘
│ 2. Miss │ 2. Write to cache
▼ ▼
┌─────────┐ ┌─────────┐
│ RDS │ │ Cache │
└────┬────┘ └─────────┘
│ 3. Write to cache
▼
┌─────────┐
│ Cache │
└─────────┘| Pattern | Description | Trade-off |
|---|---|---|
| Lazy Loading | Cache on read (miss → fetch → cache) | Data can become stale |
| Write Through | Cache on write (DB + cache updated together) | No stale data, more writes |
| Session Store | Store temp session data with TTL | Sessions auto-expire |
ElastiCache - Redis Use Case (Gaming Leaderboards):
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌─────────────────────┐
ElastiCache Redis │ Real-time │
┌─────────┐ │ │ │ Leaderboard │
│ Clients │──→ ┌─────┐ ┌─────┐ ──────→│ ┌─────────────┐ │
└─────────┘ │ │Redis│ │Redis│ │ │ │ 1. Player A │ │
└─────┘ └─────┘ │ │ 2. Player B │ │
│ ┌─────┐ │ │ │ 3. Player C │ │
│Redis│ │ └─────────────┘ │
│ └─────┘ │ └─────────────────────┘
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘⚠️ Exam trap: “Real-time leaderboard” computationally complex without Redis Sorted Sets (not Memcached — no sorted sets!)
DynamoDB fully managed highly available (with replication across 3 AZ), NoSQL (key/value) database that scales to massive workloads and single-digit millisecond latency.
DynamoDB Accelerator - DAX fully managed in-memory cache for DynamoDB (x10 performance improvement). Like ElastiCache, but only for DynamoDB.
Amazon Redshift is a fully managed OLAP (Online Analytical Processing) data warehouse for PB-scale analytics.
Redshift Cluster Architecture:
Query (JDBC/ODBC)
│
▼
┌─────────────────────────────┐
│ Amazon Redshift Cluster │
│ ┌───────────────────────┐ │
│ │ Leader Node │ │ ← Query planning, results aggregation
│ └───────────┬───────────┘ │
│ ┌──────┼──────┐ │
│ ▼ ▼ ▼ │
│ ┌────────┐┌────────┐┌────────┐
│ │Compute ││Compute ││Compute │ ← Perform queries, send to leader
│ │ Node ││ Node ││ Node │
│ └────────┘└────────┘└────────┘
└─────────────────────────────┘Redshift Modes:
| Mode | Description | Cost Model |
|---|---|---|
| Provisioned | Choose instance types upfront | Reserved instances for savings |
| Serverless | Auto-scales, no management | Pay per use |
Loading Data into Redshift:
| Method | Description | Best For |
|---|---|---|
| Kinesis Firehose | Stream → S3 → Redshift (COPY) | Real-time streaming |
| S3 COPY command | Bulk load from S3 | Large batch imports |
| EC2 JDBC driver | Insert via application | Small batches (less efficient) |
⚠️ Exam trap: “Load data into Redshift” → Large inserts are MUCH better. Use S3 COPY or Firehose, not row-by-row JDBC inserts.
Enhanced VPC Routing:
⚠️ Exam trap: “COPY/UNLOAD through VPC” or “Redshift traffic stays in VPC” → Enhanced VPC Routing. “Improved VPC Routing” doesn’t exist!
Redshift Spectrum:
Redshift Spectrum:
Query ──► Redshift Cluster ──► Spectrum Nodes (1000s) ──► S3 Bucket
(Leader + Compute) (query S3 directly)Redshift vs Athena:
| Aspect | Redshift | Athena |
|---|---|---|
| Type | Data warehouse | Query service |
| Infrastructure | Cluster (Provisioned/Serverless) | Fully serverless |
| Best for | Complex joins, aggregations, dashboards | Ad-hoc queries on S3 |
| Performance | Faster (indexes, columnar) | Slower (full S3 scan) |
| Data location | Loaded into Redshift | Stays in S3 |
| Cost model | Cluster time | $5/TB scanned |
⚠️ Exam trap: “Faster joins/aggregations” or “BI dashboards on data warehouse” → Redshift. “Serverless ad-hoc S3 queries” → Athena.
Redshift Snapshots & DR:
| Snapshot Type | Frequency | Retention |
|---|---|---|
| Automated | Every 8 hours or 5 GB | 1-35 days |
| Manual | On-demand | Until you delete |
Cross-Region DR:
Cross-Region Snapshot Copy:
Region A Region B
┌─────────────┐ Auto Copy ┌──────────────┐
│ Redshift │──────────────►│ Snapshot │
│ Cluster │ │ (copied) │
└──────┬──────┘ └──────┬───────┘
│ Snapshot │ Restore
▼ ▼
┌─────────────┐ ┌──────────────┐
│ Snapshot │ │ New Cluster │
│ (original) │ │ (DR region) │
└─────────────┘ └──────────────┘⚠️ Exam trap: “Redshift cross-region DR” → Cross-region snapshot copy. Restore snapshot in target region.
⚠️ Exam trap: “Redshift Global cluster” → Doesn’t exist! Aurora has Global Database, Redshift uses cross-region snapshot copy instead.
⚠️ Exam trap: Redshift vs Athena vs EMR:
Amazon Elastic MapReduce (EMR) = managed Hadoop clusters for big data processing.
EMR Node Types:
| Node Type | Purpose | Lifecycle |
|---|---|---|
| Master Node | Manage cluster, coordinate, health | Long-running |
| Core Node | Run tasks + store data | Long-running |
| Task Node | Run tasks only (no storage) | Usually Spot |
EMR Purchasing Options:
| Option | Use Case |
|---|---|
| On-Demand | Reliable, won’t be terminated |
| Reserved | Cost savings (min 1 year), auto-used if available |
| Spot | Cheaper, can be terminated (for Task Nodes) |
Cluster Types:
⚠️ Exam trap: “Cost-optimize EMR” → Use Spot for Task Nodes (can lose them), Reserved/On-Demand for Master/Core (need reliability).
⚠️ Exam trap: EMR vs Athena vs Redshift:
Amazon Athena serverless SQL query service to analyze data stored in Amazon S3.
Athena Use Cases:
Athena Architecture:
Users ──► S3 Bucket ──► Amazon Athena ──► Amazon QuickSight
(data) (Query & Analyze) (Reporting & Dashboards)Athena Federated Query:
Federated Query:
┌─► S3 Bucket
├─► ElastiCache
├─► DocumentDB
Amazon Athena ◄─────────┼─► DynamoDB ◄── Lambda (Data Source Connector)
├─► Redshift
├─► Aurora/RDS
├─► HBase in EMR
└─► On-Premises DBAthena Performance Optimization:
| Optimization | Why |
|---|---|
| Columnar format (Parquet/ORC) | Scan less data → lower cost |
| Glue ETL | Convert CSV/JSON to Parquet/ORC |
| Compress data | Smaller scans (gzip, snappy, lz4, zstd) |
| Partition datasets | Query specific partitions only |
| Large files (> 128 MB) | Minimize overhead |
S3 Partitioning Example:
s3://bucket/table/year=1991/month=1/day=1/data.parquet
└── partition columns as virtual columns⚠️ Exam trap: “Analyze data in S3 using serverless SQL” → Athena. Not Redshift (requires provisioning), not EMR (requires cluster).
⚠️ Exam trap: “Reduce Athena costs” → Parquet/ORC (columnar = scan less). Glue can convert formats.
⚠️ Exam trap: “Query multiple data sources with SQL” → Athena Federated Query (uses Lambda connectors).
Amazon QuickSight = serverless ML-powered BI service for interactive dashboards.
⚠️ Exam trap: “Column-level security” services:
QuickSight Use Cases:
QuickSight Data Sources:
| Source Type | Examples |
|---|---|
| AWS Services | RDS, Aurora, Redshift, Athena, S3, OpenSearch, Timestream |
| On-Premises | Databases via JDBC (Teradata) |
| SaaS | Salesforce, Jira |
| File Imports | XLSX, CSV, JSON, TSV, ELF/CLF (log formats) |
QuickSight Integrations:
┌─────────────────────────────────────────────────────────┐
│ Amazon QuickSight │
└────────────────────────┬────────────────────────────────┘
│
┌────────────────────┼────────────────────────────────┐
│ │ │
▼ ▼ ▼
AWS Services On-Premises/SaaS File Imports
RDS, Aurora, Teradata (JDBC), XLSX, CSV,
Redshift, Athena, Salesforce, Jira JSON, TSV,
S3, OpenSearch, Log files
TimestreamQuickSight Users & Sharing:
⚠️ Exam trap: “BI dashboards from multiple AWS sources” → QuickSight. Integrates with Athena, Redshift, RDS, S3, etc.
⚠️ Exam trap: QuickSight users/groups ≠ IAM. They are QuickSight-specific identities.
Amazon OpenSearch Service (successor to ElasticSearch) = managed search and analytics engine.
OpenSearch Ingestion Patterns:
| Source | Path | Latency |
|---|---|---|
| Kinesis Data Streams | → Firehose → Lambda (transform) → OpenSearch | Near real-time |
| Kinesis Data Streams | → Lambda → OpenSearch | Real-time |
| CloudWatch Logs | → Subscription Filter → Lambda → OpenSearch | Real-time |
| CloudWatch Logs | → Subscription Filter → Firehose → OpenSearch | Near real-time |
| DynamoDB | → DynamoDB Streams → Lambda → OpenSearch | Real-time |
DynamoDB + OpenSearch Pattern:
CRUD ──► DynamoDB ──► DynamoDB Stream ──► Lambda ──► OpenSearch
│ │
│ │
└─── API to retrieve items ◄── App ──► API to search items ───┘⚠️ Exam trap: “Search any field” or “partial text match” or “full-text search” → OpenSearch. DynamoDB only queries by primary key or indexes.
⚠️ Exam trap: “Real-time” vs “Near real-time” ingestion:
DocumentDB is a document database (NoSQL) service that supports MongoDB workloads, proprietary fully managed and highly available across 3 AZ. Automatically grows and scales to workloads with millions of requests per second.
⚠️ Exam trap: DynamoDB vs DocumentDB — “MongoDB migration” doesn’t always mean DocumentDB!
| Requirement | Answer |
|---|---|
| MongoDB compatibility + no code changes | DocumentDB |
| Serverless + Global Tables + no server management | DynamoDB |
Key decision point:
DocumentDB requires provisioned instances (not truly serverless), but preserves MongoDB compatibility.
Amazon Neptune is a fully managed graph database. Usually for graph data sets like social network, knowledge graphs (Wikipedia), recommendation engines and fraud detection. Highly available across 3 AZ, with up to 15 replicas.
⚠️ Exam trap: Graph queries = Neptune. Classic example:
Neptune Use Cases: Social networks, recommendation engines, fraud detection, knowledge graphs
Amazon Timestream fully managed, fast, scalable, serverless time series database. Built-in time series analytics functions (helps you identify patterns in your data in near real-time).
Timestream Use Cases: IoT sensors (temperature, humidity, pressure), application metrics, DevOps monitoring, industrial telemetry
⚠️ Exam trap: “Thousands of sensors” + “readings per second” + “fast analytics” = Timestream
Amazon Keyspaces (for Apache Cassandra) is a fully managed, serverless, Cassandra-compatible database. Highly available and scalable with no servers to manage.
⚠️ Exam trap: Cassandra migration → Keyspaces (not DynamoDB!)
Amazon QLDB (Quantum Ledger Database) is a fully managed, serverless, highly available book recording financial transactions. (Unlike Amazon Managed Blockchain there is no decentralization component).
Amazon Managed Blockchain is managed blockchain service to join public blockchain networks or create your own scalable private network, without the need for a trusted, central authority. Compatible with Hyperledger Fabric and Ethereum.
AWS Glue = fully serverless managed ETL (Extract, Transform, Load) service.
Glue Components:
| Component | Purpose |
|---|---|
| Glue Data Crawler | Scans data sources, writes metadata to Data Catalog |
| Glue Data Catalog | Central metadata repository (databases, tables) |
| Glue ETL Jobs | Transform and load data |
| Glue Job Bookmarks | Prevent re-processing old data |
| Glue DataBrew | Clean/normalize data with pre-built transformations |
| Glue Studio | GUI to create, run, monitor ETL jobs |
| Glue Streaming ETL | Real-time ETL (Spark Streaming) for Kinesis, Kafka, MSK |
Glue Data Catalog Architecture:
Data Sources Glue Data Catalog Consumers
┌─────────────┐ ┌─────────────────┐
│ Amazon S3 │ │ Databases │ ┌─────────────┐
│ Amazon RDS │──► Glue ───────►│ Tables │──────────►│ Athena │
│ DynamoDB │ Crawler │ (Metadata) │ │ Redshift │
│ JDBC │ (writes └─────────────────┘ │ EMR │
└─────────────┘ metadata) ▲ └─────────────┘
│
Glue ETL JobsGlue ETL Pattern — Convert to Parquet:
S3 Put ──► Input S3 ──► Glue ETL ──► Output S3 ──► Athena
(CSV) (transform) (Parquet) (analyze)
│
▼
S3 Event ──► Lambda ──► Trigger Glue Job
(or EventBridge)Common Glue Use Cases:
⚠️ Exam trap: “Convert CSV to Parquet for Athena” → Glue ETL. Glue can be triggered by S3 events via Lambda or EventBridge.
⚠️ Exam trap: “Centralized metadata catalog” or “data discovery” → Glue Data Catalog. Used by Athena, Redshift Spectrum, EMR.
⚠️ Exam trap: “Streaming ETL” → Glue Streaming ETL (Spark Streaming). Compatible with Kinesis, Kafka, MSK.
⚠️ Exam trap: “Prevent re-processing old data” or “incremental ETL” → Glue Job Bookmarks. Tracks what’s already processed, only processes new data.
AWS Lake Formation = fully managed service to set up a data lake in days.
Lake Formation Features:
| Feature | Description |
|---|---|
| Source Blueprints | Pre-built connectors for S3, RDS, Aurora, on-premises DBs |
| ETL and Data Prep | Transform and prepare data |
| Data Catalog | Central metadata repository |
| Fine-grained Access Control | Row-level and Column-level security |
| Security Settings | Centralized permissions management |
Lake Formation Architecture:
Data Sources Lake Formation Consumers
┌─────────────┐ ┌────────────────────┐
│ Amazon S3 │ │ • Source Crawlers │ ┌─────────────┐
│ RDS/Aurora │──► ingest ──►│ • ETL & Data Prep │──────►│ Athena │
│ On-Premises │ │ • Data Catalog │ │ Redshift │
│ (SQL/NoSQL) │ │ • Access Control │ │ EMR/Spark │
└─────────────┘ │ (row/column) │ └─────────────┘
└─────────┬──────────┘ │
│ ▼
┌───────▼───────┐ Users
│ Data Lake │
│ (stored in S3)│
└───────────────┘Lake Formation vs Glue:
| Aspect | Glue | Lake Formation |
|---|---|---|
| Focus | ETL + Data Catalog | Complete data lake management |
| Security | Basic IAM | Fine-grained (row/column-level) |
| Scope | ETL jobs | End-to-end data lake |
| Built on | - | AWS Glue |
⚠️ Exam trap: “Data lake” + “fine-grained access control” or “row/column-level security” → Lake Formation. Not just Glue (Glue = ETL only, no fine-grained permissions).
⚠️ Exam trap: “Centralized permissions for data lake” → Lake Formation. Manages access across Athena, Redshift, EMR in one place.
Amazon MSK (Managed Streaming for Apache Kafka) = fully managed Apache Kafka on AWS.
MSK Architecture:
Producers MSK Cluster Consumers
(Kinesis, IoT, ┌─────────────────────────┐
RDS, etc.) │ ┌──────────┐ │ ┌──────────────────┐
│ │ │ Broker 1 │◄──┐ │ │ Kinesis Data │
▼ │ └──────────┘ │ │ │ Analytics (Flink)│
┌──────────┐ │ │ replication ──►│ Glue Streaming │
│ Your │──────┼──►┌──────────┐ │ │ │ Lambda │
│ Code │ │ │ Broker 2 │◄──┤ │ │ EC2/ECS/EKS │
└──────────┘ │ └──────────┘ │ │ └──────────────────┘
│ │ │ │
│ ┌──────────┐ │ │
│ │ Broker 3 │◄┘ │
│ └──────────┘ │
└─────────────────────────┘Kinesis Data Streams vs Amazon MSK:
| Aspect | Kinesis Data Streams | Amazon MSK |
|---|---|---|
| Message size | 1 MB limit | 1 MB default, configurable to 10 MB |
| Data structure | Shards | Kafka Topics with Partitions |
| Scaling | Shard splitting & merging | Can only add partitions |
| In-flight encryption | TLS only | PLAINTEXT or TLS |
| At-rest encryption | KMS | KMS |
| Retention | 1-365 days | Unlimited (EBS) |
MSK Consumers:
Amazon Managed Service for Apache Flink (previously: Kinesis Data Analytics for Apache Flink)
Flink Sources:
Kinesis Data Streams ──┐
├──► Amazon Managed Service ──► (destinations)
Amazon MSK ────────────┘ for Apache Flink⚠️ Exam trap: “Kafka on AWS” or “migrate Kafka” → Amazon MSK. Kinesis is AWS-native, MSK is Kafka-compatible.
⚠️ Exam trap: “Message > 1 MB” streaming → MSK (configurable up to 10 MB). Kinesis = hard 1 MB limit.
⚠️ Exam trap: “Apache Flink” or “real-time stream analytics” → Amazon Managed Service for Apache Flink. Note: Flink does NOT read from Firehose!
⚠️ Exam trap: Kinesis vs MSK decision:
Requirements: Real-time collection → Transform → SQL query → Reports in S3 → Warehouse + Dashboards
IoT Devices
│
▼ (real-time)
┌─────────────────┐ Every 1 min ┌─────────────┐
│ Kinesis Data │───────────────────►│ Ingestion │
│ Streams │ │ Bucket (S3) │
└─────────────────┘ └──────┬──────┘
│ │
┌────┴────┐ (optional)
▼ │ │
┌─────────┐ │ ┌────▼────┐
│ Kinesis │ │ │ SQS │
│ Firehose│◄──┘ └────┬────┘
└────┬────┘ │
│ ┌────▼────┐ Pull data
│ Lambda │ Lambda │◄──────────┐
│ (transform) └────┬────┘ │
▼ │ │
┌────▼────┐ ┌─────┴─────┐
│ Athena │────►│ Reporting │
│ (SQL) │ │ Bucket │
└─────────┘ └─────┬─────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
QuickSight Redshift (other BI)
(dashboards) ServerlessPipeline Components:
| Stage | Service | Why |
|---|---|---|
| Ingest real-time | Kinesis Data Streams | Real-time data collection |
| Buffer + Deliver | Kinesis Firehose | Near real-time delivery to S3 (1 min) |
| Transform | Lambda + Firehose | Data transformations during delivery |
| Store | S3 (Ingestion Bucket) | Durable storage, triggers events |
| Decouple | SQS (optional) | Buffer between S3 and processing |
| Query | Athena | Serverless SQL on S3 |
| Output | S3 (Reporting Bucket) | Query results storage |
| Visualize | QuickSight / Redshift | Dashboards and analytics |
Key Points:
⚠️ Exam trap: “Serverless” + “real-time ingestion” + “SQL query” + “dashboards” → This full pipeline. Know each component’s role!
Database Selection Guide:
| Need | Use |
|---|---|
| SQL, ACID, complex queries | RDS / Aurora |
| Key-value, massive scale, single-digit ms | DynamoDB |
| Key-value, large objects (100MB+ files) | S3 |
| Caching, sessions, leaderboards | ElastiCache (Redis/Memcached) |
| Data warehouse, analytics (PB scale) | Redshift |
| Graph relationships (social, fraud) | Neptune |
| Time series (IoT, metrics) | Timestream |
| Document store (MongoDB compatible) | DocumentDB |
| Immutable ledger (financial) | QLDB |
| ETL / Data catalog | Glue |
⚠️ Exam traps:
⚠️ Exam trap - “In-memory + caching SQL queries + HIPAA”:
RDBMS (RDS/Aurora): Structured data, complex joins, ACID transactions, fixed schema NoSQL (DynamoDB): Flexible schema, massive scale, key-value access, millisecond latency
Rule: Need JOINs or transactions? → RDS/Aurora. Need scale + flexibility? → DynamoDB.
This is THE most tested concept:
Key insight: Multi-AZ standby is for failover ONLY. It cannot be read from.
Aurora is AWS’s cloud-optimized relational DB. Same concept as RDS, but:
If question mentions RDS + wants better performance/features → think Aurora.
| Cache | Code Changes? | Works With |
|---|---|---|
| ElastiCache | ✅ Required | Any application |
| DAX | ❌ Not required | DynamoDB only |
DAX uses the same DynamoDB API. ElastiCache requires application modifications.
You cannot encrypt an existing unencrypted database directly. Solution: Snapshot → Restore as encrypted → Switch applications
Same applies: Master not encrypted → Replicas CANNOT be encrypted.
| Service | Cross-Region Feature | Behavior |
|---|---|---|
| RDS | Read Replica (cross-region) | Manual promotion, costs $$$ |
| Aurora | Global Database | <1s replication, <1min failover |
| DynamoDB | Global Tables | Active-active (writes anywhere!) |
Key insight: Only DynamoDB Global Tables allows writes in multiple regions.
Restoring from backup/snapshot ALWAYS creates a new database instance:
Exception: Aurora Backtrack → in-place rewind (no new DB created).
Match the data type to the purpose-built database:
What's the requirement?
│
┌─────────────┬───────────┼───────────┬─────────────┬────────────┐
▼ ▼ ▼ ▼ ▼ ▼
SQL + Joins Key-Value Caching Analytics Specialized Big Objects
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
RDS/Aurora DynamoDB ElastiCache Redshift See Step 3 S3
/DAX /Athena Need relational database?
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
Standard needs Cloud-optimized Full OS access?
│ │ │
▼ ▼ ▼
RDS Aurora RDS Custom
│ │ (Oracle/SQL only)
│ │
▼ ▼
Cross-region? Unpredictable
│ workload?
▼ │
Read Replica ▼
(manual failover) Aurora Serverless| If the data is… | Use… |
|---|---|
| Graph relationships (social, fraud) | Neptune |
| Time series (IoT, metrics, logs) | Timestream |
| Immutable ledger (financial, compliance) | QLDB |
| MongoDB-compatible JSON documents | DocumentDB |
| Cassandra-compatible wide-column | Keyspaces |
| Free-text search | OpenSearch |
| Blockchain (decentralized) | Managed Blockchain |
| If question mentions… | Answer is… |
|---|---|
| “Users don’t see updated data” + Read Replica | Expected behavior (ASYNC lag) |
| “Analytics slowing production” | Offload to Read Replica |
| “Cross-region disaster recovery” + Aurora | Aurora Global Database |
| “Dev/test” + “unused most of time” | Aurora Serverless |
| “Production data ASAP” + “read/write tests” | Aurora Cloning |
| “Full OS customization” + Oracle/SQL Server | RDS Custom |
| “Lambda” + “DB connections” + “failover” | RDS Proxy |
| “Users keep logging out” + Auto Scaling | ElastiCache (sessions) |
| “Real-time leaderboard” + “ranked” | Redis Sorted Sets |
| “DynamoDB” + “microsecond reads” | DAX |
| “Multi-region active-active writes” | DynamoDB Global Tables |
| “Social network” + “relationships” | Neptune |
| “IoT” + “time-series” | Timestream |
| “Immutable” + “financial audit” | QLDB |
What do you need to do with streaming data?
│
┌───────────────┼───────────────┬───────────────────┐
▼ ▼ ▼ ▼
INGEST DELIVER ANALYZE KAFKA
(collect) (to S3/etc) (real-time) (compatible)
│ │ │ │
▼ ▼ ▼ ▼
Kinesis Kinesis Kinesis Data Amazon
Data Streams Firehose Analytics/Flink MSK| Service | Purpose | Key Feature |
|---|---|---|
| Kinesis Data Streams | Ingest real-time data | Custom consumers, 1-365 day retention |
| Kinesis Firehose | Deliver to destinations | Near real-time (1 min buffer), auto-scaling |
| Kinesis Data Analytics | Real-time analytics | Apache Flink, SQL on streams |
| Amazon MSK | Managed Kafka | Kafka-compatible, 10 MB messages, unlimited retention |
| Cannot… | Instead… |
|---|---|
| Read from Multi-AZ standby | Use Read Replica for read scaling |
| Write to Read Replica | Promote it first (breaks replication) |
| Encrypt existing DB directly | Snapshot → Restore as encrypted |
| Use IAM Auth with Oracle/SQL Server | Only MySQL, PostgreSQL, MariaDB |
| Use Backtrack on RDS | Aurora-only feature |
| Use DAX with non-DynamoDB | Use ElastiCache instead |
| Use ElastiCache without code changes | Use DAX for DynamoDB (same API) |
| Cross-region failover with Multi-AZ | Multi-AZ = same region only |
| Use “Redshift Global cluster” | Doesn’t exist! Use cross-region snapshot copy |
| Read from Firehose with Flink | Flink reads from Streams or MSK only |
Keywords: OLTP, auto-scaling, maximum replicas, transactional
Answer: Aurora
Why: OLTP = relational (not NoSQL). Aurora has auto-scaling storage (10GB→128TB) + 15 replicas. RDS storage requires manual provisioning.
Keywords: reporting, analytics, BI tools, production performance
Answer: Create Read Replica for analytics workload
Why: Read Replicas are ASYNC, so heavy queries won’t affect the master.
Keywords: stale data, eventually consistent, lag, Read Replica
Answer: This is expected behavior (ASYNC replication)
Why: Read Replica uses ASYNC replication. If strong consistency needed → read from master.
Keywords: cross-region, DR, RTO <1 minute, Aurora
Answer: Aurora Global Database
Why: <1 second replication, <1 minute RTO. RDS cross-region Read Replica = manual promotion.
Keywords: development, testing, intermittent, unpredictable, minimize costs
Answer: Aurora Serverless
Why: Scales to zero, pay per second. Provisioned = pay even when idle.
Keywords: clone production, read/write tests, staging environment, fast copy
Answer: Aurora Cloning (instant copy-on-write)
Why: Snapshot/restore = slow (copies all data). Read Replica = read-only. Cloning = instant + writable.
Keywords: customize OS, install patches, SSH access, Oracle/SQL Server
Answer: RDS Custom
Why: Standard RDS = no SSH. EC2 = no AWS management. RDS Custom = both.
Keywords: key-value, large files, 100MB, store files, durable storage
Answer: S3 (NOT DynamoDB!)
Why: S3 IS a key-value store (key = path, value = object). DynamoDB has 400KB item limit. For files 100MB+ → S3.
Keywords: Lambda, connection pooling, many connections, failover time
Answer: RDS Proxy
Why: Connection pooling reduces DB load. 66% faster failover. Works great with Lambda.
Keywords: sessions, logged out, ALB, Auto Scaling, stateless
Answer: ElastiCache (session store) or DynamoDB with TTL
Why: Sessions stored in shared cache → any instance can retrieve. NOT sticky sessions (uneven load).
Keywords: leaderboard, ranking, sorted scores, real-time
Answer: Redis Sorted Sets
Why: Redis Sorted Sets guarantee uniqueness + ordering. Memcached has no sorted sets.
Keywords: DynamoDB, faster reads, microsecond, cache
Answer: DAX (DynamoDB Accelerator)
Why: 10x faster reads, no code changes (same API). ElastiCache = different API.
Keywords: active-active, write to any region, global users
Answer: DynamoDB Global Tables
Why: Aurora Global = read-only replicas. Only DynamoDB Global Tables = writes in any region.
Keywords: MongoDB, migrate, no code changes, same drivers, existing application
Answer: DocumentDB
Why: DocumentDB is MongoDB-compatible (same API/drivers). Application code works unchanged. Note: NOT RDS — there’s no “RDS for MongoDB”!
Keywords: MongoDB, NoSQL, serverless, global, no server management
Answer: DynamoDB (NOT DocumentDB!)
Why: DocumentDB requires provisioned instances (not serverless). DynamoDB = truly serverless + Global Tables. “MongoDB” in question is a distractor — focus on requirements.
Keywords: friends of friends, social graph, relationships, connections, likes, multi-hop queries
Answer: Neptune (Graph database)
Why: Graph databases are optimized for relationship traversals. Example: “likes on posts by friends of Mike” = multi-hop graph query. RDS would need complex JOINs; DynamoDB can’t do JOINs at all.
Keywords: Cassandra, migrate, CQL, wide-column, no code changes
Answer: Amazon Keyspaces
Why: Keyspaces is Cassandra-compatible (CQL). Existing Cassandra code works unchanged. Fully managed, serverless, highly available.
Keywords: IoT, sensors, time-series, metrics, trends, readings per second, temperature, humidity, pressure, fast analytics, predict
Answer: Timestream
Why: Purpose-built for time-series data. 1000x faster + 1/10th cost vs relational. Built-in analytics functions for pattern detection.
Keywords: immutable, ledger, financial, compliance, audit, cannot modify
Answer: QLDB
Why: Cryptographically verifiable history. Note: QLDB ≠ blockchain (centralized, no decentralization).
Keywords: encrypt, existing database, unencrypted, enable encryption
Answer: Snapshot → Restore as encrypted
Why: Cannot enable encryption on existing DB. Must create new encrypted DB from snapshot.
Keywords: search any field, partial match, full-text search, DynamoDB + search
Answer: DynamoDB + OpenSearch
Why: DynamoDB only queries by primary key/indexes. OpenSearch enables full-text search. Use DynamoDB Streams → Lambda → OpenSearch to sync data.
Keywords: logs, search, CloudWatch, real-time, analytics, dashboards
Answer: CloudWatch Logs → OpenSearch (via Lambda or Firehose)
Why: OpenSearch provides search + OpenSearch Dashboards for visualization. Lambda = real-time, Firehose = near real-time.
Keywords: logs in S3, serverless, quick analysis, SQL, ad-hoc
Answer: Amazon Athena
Why: Athena = serverless SQL directly on S3. No infrastructure to manage. Pay $5/TB scanned.
Keywords: data warehouse, OLAP, columnar, analytics, QuickSight, Tableau, dashboards
Answer: Amazon Redshift + QuickSight
Why: Redshift = OLAP data warehouse (columnar storage). QuickSight = native BI integration for dashboards.
Keywords: convert JSON/CSV to Parquet, optimize Athena, reduce costs
Answer: AWS Glue ETL
Why: Glue transforms data formats. Parquet = columnar = Athena scans less = cheaper.
Keywords: real-time analytics, stream processing, Kinesis, SQL on streams
Answer: Kinesis Data Analytics (Amazon Managed Service for Apache Flink)
Why: Flink processes streams in real-time. Reads from Kinesis Data Streams or MSK. NOT Firehose!
Keywords: Apache Kafka, migrate, Kafka-compatible, existing application
Answer: Amazon MSK
Why: MSK = managed Kafka. Same APIs, no code changes. Kinesis requires code changes.
Keywords: Redshift, cross-region, DR, disaster recovery
Answer: Cross-region snapshot copy
Why: Enable automated snapshots + configure cross-region copy. Restore in DR region. “Redshift Global” doesn’t exist!
| Feature | RDS | Aurora |
|---|---|---|
| Engines | PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, DB2 | PostgreSQL, MySQL |
| Performance | Standard | 5x MySQL, 3x PostgreSQL |
| Storage | EBS-backed, auto-scaling | 6 copies across 3 AZ, 128TB max |
| Read Replicas | Up to 15 | Up to 15, <10ms lag |
| Failover | Slower | <30 seconds |
| Backtrack | ❌ No | ✅ Yes (in-place rewind) |
| Serverless | ❌ No | ✅ Yes |
| Global Database | Read Replica only | <1s replication, <1min RTO |
| Cloning | Snapshot only | Instant copy-on-write |
| Feature | Read Replica | Multi-AZ | Aurora Global |
|---|---|---|---|
| Purpose | Read scaling | HA/Failover | Cross-region DR |
| Replication | ASYNC | SYNC | ASYNC (<1 sec) |
| Serve reads? | ✅ Yes | ❌ No | ✅ Yes |
| Auto failover? | ❌ Manual | ✅ Auto | ✅ Auto (<1 min) |
| Cross-region? | ✅ Yes | ❌ No | ✅ Yes |
| Feature | ElastiCache Redis | ElastiCache Memcached | DAX |
|---|---|---|---|
| Works with | Any app | Any app | DynamoDB only |
| Code changes | ✅ Required | ✅ Required | ❌ Not required |
| HA | Multi-AZ + failover | ❌ No | Multi-AZ |
| Persistence | ✅ AOF | ❌ No | N/A |
| Sorted Sets | ✅ Yes | ❌ No | N/A |
| Service | Data Model | Compatible With | Use Case |
|---|---|---|---|
| DynamoDB | Key-value | - | Serverless, sessions |
| DocumentDB | Document | MongoDB | MongoDB workloads |
| Keyspaces | Wide-column | Cassandra | Cassandra workloads |
| Neptune | Graph | Gremlin, SPARQL | Social, fraud |
| Timestream | Time series | SQL | IoT, metrics |
| QLDB | Ledger | SQL | Immutable audit |
| Item | Value |
|---|---|
| Read Replicas max | 15 |
| Aurora storage max | 128 TB |
| Aurora copies | 6 across 3 AZ |
| Aurora failover | <30 seconds |
| Aurora Global replication | <1 second |
| Aurora Global RTO | <1 minute |
| DynamoDB item size limit | 400 KB |
| S3 object size max | 5 TB |
| Automated backup retention | 1-35 days |
| Manual snapshot retention | Unlimited |
| RDS Proxy failover improvement | 66% faster |
| Question Contains | → Instant Answer |
|---|---|
| SQL + Joins + Transactions | RDS / Aurora |
| “OLTP” + “auto-scaling storage” | Aurora |
| “5x MySQL performance” | Aurora |
| “cross-region” + “RTO <1 min” | Aurora Global |
| “intermittent workload” | Aurora Serverless |
| “in-place rewind” | Aurora Backtrack |
| “clone production instantly” | Aurora Cloning |
| “OS access” + Oracle/SQL Server | RDS Custom |
| “analytics slowing production” | Read Replica |
| “can’t read from standby” | Multi-AZ (expected) |
| “Lambda + connections” | RDS Proxy |
| “66% faster failover” | RDS Proxy |
| “key-value” + “large files” (MB+) | S3 (not DynamoDB!) |
| “serverless NoSQL” | DynamoDB |
| “serverless” + “global” + NoSQL | DynamoDB (not DocumentDB) |
| “microsecond DynamoDB reads” | DAX |
| “active-active writes” | DynamoDB Global Tables |
| “sessions across instances” | ElastiCache / DynamoDB TTL |
| “leaderboard + rankings” | Redis Sorted Sets |
| “HA + persistence cache” | Redis |
| “MongoDB” + “no code changes” | DocumentDB |
| “MongoDB compatible” (same drivers) | DocumentDB |
| “MongoDB” + “serverless” + “global” | DynamoDB (trap!) |
| “RDS for MongoDB” | Doesn’t exist! (trap) |
| “graph + relationships” | Neptune |
| “social network analysis” | Neptune |
| “friends of friends” queries | Neptune |
| “likes on posts by friends” | Neptune |
| “fraud detection patterns” | Neptune |
| “IoT + time-series” | Timestream |
| “sensors” + “readings per second” | Timestream |
| “temperature/humidity/pressure” | Timestream |
| “immutable financial ledger” | QLDB |
| “Cassandra compatible” | Keyspaces |
| “free-text search” | OpenSearch |
| “partial match” + “any field” | OpenSearch |
| “search DynamoDB data” | DynamoDB + OpenSearch |
| “logs to dashboards” | CloudWatch → OpenSearch |
| “ETL + data catalog” | Glue |
| “convert CSV to Parquet” | Glue ETL |
| “centralized metadata” | Glue Data Catalog |
| “streaming ETL” | Glue Streaming ETL |
| “prevent re-processing” | Glue Job Bookmarks |
| “serverless SQL on S3” | Athena |
| “PB-scale analytics” | Redshift |
| “BI dashboards” | QuickSight |
| “visualizations from Athena/Redshift” | QuickSight |
| “embeddable analytics” | QuickSight |
| “data lake” | Lake Formation |
| “row/column-level security” | Lake Formation (data lake) or QuickSight Enterprise (dashboards) |
| “centralized data lake permissions” | Lake Formation |
| “column-level security” + “dashboards” | QuickSight Enterprise |
| “Kafka on AWS” | Amazon MSK |
| “migrate Kafka” | Amazon MSK |
| “message > 1 MB streaming” | MSK (up to 10 MB) |
| “unlimited stream retention” | MSK |
| “Apache Flink” | Managed Service for Apache Flink |
| “real-time stream analytics” | Managed Service for Apache Flink |
| “Flink + Kinesis or MSK” | Managed Service for Apache Flink |
| “logs in S3” + “quick analysis” | Athena |
| “OLAP” + “columnar” + “warehouse” | Redshift |
| “Redshift Global cluster” | Doesn’t exist! (trap) |
| “Redshift cross-region DR” | Cross-region snapshot copy |
| “COPY/UNLOAD through VPC” | Enhanced VPC Routing |
| “Spark/Hive/Presto” + “big data” | EMR |
| “open source big data frameworks” | EMR |
| “deliver to S3/Redshift” + “near real-time” | Kinesis Firehose |
| “ingest real-time data” | Kinesis Data Streams |
When stuck between options, eliminate systematically:
□ Is it about READING from standby?
→ Multi-AZ standby can't serve reads
→ Use Read Replica for read scaling
□ Is it CROSS-REGION?
→ Multi-AZ = same region only (eliminate it)
→ Aurora Global or DynamoDB Global Tables
□ Does it need WRITES in multiple regions?
→ Aurora Global = read-only replicas (eliminate it)
→ DynamoDB Global Tables = active-active writes
□ Is it about CACHING without code changes?
→ ElastiCache requires code changes (eliminate it)
→ DAX works with DynamoDB API (no changes)
□ Does it mention ORACLE or SQL SERVER customization?
→ Standard RDS = no SSH (eliminate it)
→ RDS Custom allows full access
□ Is it asking for INSTANT clone?
→ RDS Snapshot = slow (eliminate it)
→ Aurora Cloning = instant
□ Is it GRAPH data?
→ DynamoDB/RDS = complex (eliminate them)
→ Neptune is purpose-built
□ Is it TIME SERIES?
→ DynamoDB/RDS = not optimized (eliminate them)
→ Timestream is purpose-built
□ Is it IMMUTABLE ledger?
→ DynamoDB = mutable (eliminate it)
→ QLDB is immutableAmazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service that helps you easily deploy, manage, and scale containerized applications. Amazon ESC intergrated with Application Load Balancer (ALB).
Types of provisioning:
Amazon Elastic Container Registry (ECR) is a public or private registry to store container images, so they can be run by ECS.
ECS task execution role is capabilities of ECS agent (and container instance), e.g:
Amazon Lambda is serverless, autoscaled, event-driven service to run on-demand virual functions. Supports many programming languages.
Amazon API Gateway fully managed, serverless and scalable service for developers to easily create, publish, maintain and monitor APIs. Support RESTful APIs and WebSocket APIs.
AWS Batch fully managed batch processing at any scale. Batch will dynaicatlly launch EC2 instances or Spot Instances. Batch jobs are defined as Docker images and run on ECS. AWS Batch has no time limits unlike AWS Lambda, not limited by runtimes as long as it packaged in Docker container and relies on EBS or instance store for disk space.
Serverless = paradigm where developers don’t manage servers — just deploy code/functions.
AWS Serverless Services:
| Service | Type |
|---|---|
| AWS Lambda | Compute (FaaS) |
| DynamoDB | Database (NoSQL) |
| Aurora Serverless | Database (SQL) |
| API Gateway | API management |
| S3 | Object storage |
| SNS & SQS | Messaging |
| Kinesis Data Firehose | Streaming |
| Step Functions | Workflow orchestration |
| Fargate | Serverless containers |
| Cognito | Authentication |
| CloudFront | CDN |
Lambda vs EC2:
| Aspect | Lambda | EC2 |
|---|---|---|
| Management | Virtual functions — no servers | Virtual servers to manage |
| Duration | Limited by time (15 min max) | Continuously running |
| Execution | On-demand, event-driven | Always on |
| Scaling | Automatic | Manual intervention |
| RAM/CPU | Limited (up to 10GB RAM) | Choose instance type |
Lambda Benefits:
⚠️ Exam trap: “Which service has NO built-in caching?” → Lambda. Lambda is stateless by design. API Gateway has response caching, DynamoDB has DAX. Lambda needs external cache (ElastiCache, DAX).
Lambda Language Support:
| Runtime | Languages |
|---|---|
| Native | Node.js, Python, Java, C#/.NET, PowerShell, Ruby |
| Custom Runtime API | Rust, Golang (community-supported) |
| Container Image | Any language (must implement Lambda Runtime API) |
⚠️ Exam trap: Lambda Container Image ≠ arbitrary Docker. Must implement Lambda Runtime API. For arbitrary Docker → ECS/Fargate.
Lambda Integrations (Main ones):
Lambda Use Cases (from screenshots):
Lambda SnapStart:
SnapStart Enabled: SnapStart Disabled:
invoke invoke
↓ ↓
Lambda (pre-initialized) Lambda
↓ ↓ Init
Invoke Invoke
↓ ↓
Shutdown Shutdown⚠️ Exam trap: “Reduce Lambda cold start” → SnapStart (or Provisioned Concurrency).
Lambda Concurrency:
| Type | Purpose | Cold Start | Cost |
|---|---|---|---|
| Unreserved | Default pool | Yes | Pay per use |
| Reserved | Guarantee capacity for function | Yes | Pay per use |
| Provisioned | Pre-warm instances | No | Pay for provisioned + invocations |
Throttling Behavior:
Lambda Concurrency Issue Example:
Many users → ALB → Lambda (1000 executions) ✓
Few users → API Gateway → Lambda → THROTTLE! ❌
SDK/CLI → Lambda → THROTTLE! ❌Without reserved concurrency, one source can consume all capacity.
⚠️ Exam trap: “Lambda throttling from one service” → Use Reserved Concurrency to limit/isolate capacity per function.
⚠️ Exam trap: The 1000 concurrent limit is shared across ALL functions in the account/region. One busy function can starve others → use Reserved Concurrency to isolate.
Amazon API Gateway = fully managed, serverless service to create, publish, maintain, and monitor APIs.
Lambda Layers:
⚠️ Exam trap: “Share code/libraries between Lambda functions” → Lambda Layers. Not copying code into each function.
Lambda Destinations:
⚠️ Exam trap: “Route Lambda async result on success” → Destinations. DLQ only handles failures.
AWS Batch = fully managed batch processing at any scale.
AWS Batch Architecture:
Job Queue ──► Compute Environment ──► Job Execution
│
┌────────────┼────────────┐
▼ ▼ ▼
On-Demand Spot Fargate
EC2 EC2 (serverless)Batch Components:
| Component | Description |
|---|---|
| Job Definition | How to run: Docker image, vCPU, memory, IAM role |
| Job Queue | Where jobs wait; priority-based |
| Compute Environment | Managed EC2/Spot/Fargate instances |
AWS Batch Use Cases:
Lambda vs Batch:
| Aspect | Lambda | AWS Batch |
|---|---|---|
| Time limit | 15 minutes | No limit |
| RAM | 10 GB max | Up to 100s GB |
| Disk | 10 GB /tmp | EBS volumes (TBs) |
| Runtime | Limited languages | Any Docker image |
| Invocation | Event-driven, sync/async | Job queue, scheduled |
| Scaling | Instant (1000 concurrent) | Launches instances (minutes) |
| Pricing | Per request + duration | Per EC2/Spot/Fargate time |
| Use case | Short, event-driven | Long-running batch jobs |
When to Choose Batch over Lambda:
| Scenario | Why Batch |
|---|---|
| Job > 15 min | Lambda hard limit |
| Needs > 10 GB RAM | Lambda hard limit |
| Needs > 10 GB disk | Lambda hard limit |
| GPU required | Lambda has no GPU |
| Large file processing | EBS storage available |
| Cost optimization | Spot instances (up to 90% savings) |
| Complex dependencies | Full Docker flexibility |
⚠️ Exam trap: “Batch job > 15 minutes” or “needs Docker flexibility” or “> 10 GB memory/disk” → AWS Batch. “Event-driven, quick tasks” → Lambda.
⚠️ Exam trap: “Cost-optimize long-running batch jobs” → AWS Batch with Spot Instances. Up to 90% savings vs On-Demand. Lambda has no Spot option.
⚠️ Exam trap: “Serverless batch processing” → AWS Batch on Fargate (no EC2 to manage). Still not Lambda if > 15 min.
Lambda Pricing:
| Component | Free Tier | After Free Tier |
|---|---|---|
| Requests | 1M requests/month | $0.20 per 1M requests |
| Duration | 400K GB-seconds/month | $1.00 per 600K GB-seconds |
Invocation vs Duration:
| Metric | What It Is | Depends On | Cost Impact |
|---|---|---|---|
| Invocation | 1 call = 1 request | Number of triggers | $0.20 per 1M |
| Duration | Time function runs | Code complexity, I/O | GB-seconds |
Cost = (Invocations × $0.20/1M) + (GB-seconds × $1.00/600K)
└── count only ──┘ └── complexity matters ──┘Example: Simple vs complex function
Duration examples:
Lambda Limits (per region):
| Limit | Value |
|---|---|
| Memory | 128 MB – 10 GB (1 MB increments) |
| Max execution time | 900 seconds (15 minutes) |
| Environment variables | 4 KB |
| /tmp disk | 512 MB – 10 GB |
| Concurrency | 1000 (can increase via support ticket) |
| Deployment (zip) | 50 MB compressed |
| Deployment (uncompressed) | 250 MB (code + dependencies) |
⚠️ Exam trap: Lambda limit questions — know the key numbers: 15 min timeout, 10 GB RAM, 1000 concurrency, 250 MB uncompressed.
⚠️ Exam trap — Lambda Disqualifiers: If question mentions ANY of these → Lambda is WRONG answer:
Lambda vs Alternatives Decision:
| Scenario | Best Choice | Why |
|---|---|---|
| Event-driven, < 15 min | Lambda | Instant scaling, pay per use |
| Batch job > 15 min | AWS Batch | No time limit, Docker |
| Long-running + cost optimize | AWS Batch + Spot | 90% savings |
| Containers, always running | ECS/Fargate | Long-running services |
| GPU, HPC, ML training | AWS Batch | GPU instances available |
⚠️ Exam trap: “Long job + retry + can pause/resume days later” → SQS + AWS Batch or SQS + EC2. SQS retains messages up to 14 days. SNS has no retention (push and forget). Lambda has 15 min limit.
⚠️ Exam trap: Default Lambda timeout = 3 seconds. “Timeout error after 3 seconds” = default wasn’t changed, but if job needs > 15 min, Lambda is wrong choice entirely.
Cold Starts & Provisioned Concurrency:
| Solution | Cold Start? | Cost |
|---|---|---|
| Default | Yes | Pay per use |
| SnapStart | No (for Java/Python/.NET) | No extra cost |
| Provisioned Concurrency | No | Pay for provisioned capacity |
⚠️ Exam trap: “Eliminate cold starts” → Provisioned Concurrency or SnapStart. SnapStart is free but limited to Java/Python/.NET.
Default Lambda Deployment:
Default Lambda Deployment:
┌─────────────────────────────────┐
│ AWS Cloud │
Internet ◄────────────►│ Lambda ──────► DynamoDB ✓ │
(Public) │ │ │
│ │ ┌──────────────────────┐ │
│ └──►│ VPC & Private Subnet │ │
│ │ Private RDS ✗ │ │
│ └──────────────────────┘ │
└─────────────────────────────────┘Lambda in VPC Configuration:
⚠️ Exam trap: “Lambda access RDS in private subnet” → Must configure Lambda in VPC. Default Lambda cannot reach private resources.
⚠️ Exam trap: “Lambda can read DynamoDB but can’t write to SQS” → IAM Role missing permissions (needs sqs:SendMessage). Not security groups — SQS is accessed via API, not network. SQS doesn’t have security groups.
Customization at the Edge:
Two Types:
Use Cases:
CloudFront Functions vs Lambda@Edge:
| Aspect | CloudFront Functions | Lambda@Edge |
|---|---|---|
| Runtime | JavaScript only | Node.js, Python |
| Scale | Millions req/sec | Thousands req/sec |
| Triggers | Viewer Request/Response only | Viewer + Origin Request/Response |
| Max Execution | < 1 ms | 5–10 seconds |
| Max Memory | 2 MB | 128 MB – 10 GB |
| Package Size | 10 KB | 1 MB – 50 MB |
| Network Access | No | Yes |
| File System Access | No | Yes |
| Request Body Access | No | Yes |
| Pricing | Free tier, 1/6 price of @Edge | No free tier, per request + duration |
| Managed In | CloudFront console | Lambda (us-east-1 only) |
CloudFront Request/Response Flow:
User ──► Viewer Request ──► Origin Request ──► Origin
│ │
│ │
CloudFront Func Lambda@Edge
or Lambda@Edge only
Origin ──► Origin Response ──► Viewer Response ──► User
│ │
Lambda@Edge CloudFront Func
only or Lambda@EdgeWhen to Use Which:
| Use Case | Best Choice |
|---|---|
| Cache key normalization | CloudFront Functions |
| Header manipulation | CloudFront Functions |
| URL rewrites/redirects | CloudFront Functions |
| JWT validation (simple) | CloudFront Functions |
| Needs AWS SDK | Lambda@Edge |
| Access request body | Lambda@Edge |
| External API calls | Lambda@Edge |
| Complex processing | Lambda@Edge |
⚠️ Exam trap: “Millions of requests, simple manipulation” → CloudFront Functions. “Need network/file access or origin manipulation” → Lambda@Edge.
⚠️ Exam trap: “Authenticate at CloudFront Edge” or “auth before reaching origin” → Lambda@Edge (or CloudFront Functions for simple JWT). Not API Gateway — it lives in one region, not at edge.
⚠️ Exam trap: Lambda@Edge must be authored in us-east-1 — CloudFront replicates globally.
📌 RDS & Aurora Lambda Integration — see Database section above for details (RDS Event Notifications vs Invoke Lambda from Aurora).
Overview:
⚠️ Exam trap: “Provision EC2 for DynamoDB” = False. DynamoDB is serverless — no servers/instances. Unlike RDS where you choose instance type.
DynamoDB Basics:
| Concept | Details |
|---|---|
| Structure | Tables → Items (rows) → Attributes (columns) |
| Primary Key | Must be decided at creation time |
| Items | Infinite number per table, max 400 KB per item |
| Schema | Flexible — attributes can be added over time |
DynamoDB Indexes:
| Index Type | When Created | Key | Separate Throughput |
|---|---|---|---|
| LSI (Local Secondary Index) | Table creation only | Same partition key, different sort key | No (uses table’s) |
| GSI (Global Secondary Index) | Anytime | Different partition key | Yes (own RCU/WCU) |
⚠️ Exam trap: “Query by different attribute” → add GSI. “Alternative sort key, same partition” → LSI (must define at creation).
Data Types:
⚠️ Exam trap: “Schema must rapidly evolve” or “flexible schema” → DynamoDB (NoSQL). RDS requires schema migrations.
Read/Write Capacity Modes:
| Mode | Capacity Planning | Pricing | Best For |
|---|---|---|---|
| Provisioned (default) | Specify RCU/WCU upfront | Pay for provisioned | Predictable workloads |
| On-Demand | Automatic, instant scaling | 2-3x more expensive | Unpredictable, steep spikes |
⚠️ Exam trap: “Load increases from thousands to millions in < 1 minute” or “unpredictable steep spikes” → On-Demand Mode. Provisioned auto-scaling is too slow for sudden bursts.
⚠️ Exam trap: “Cost-effective” + mixed workloads → Match mode to pattern:
What is DAX?
DAX Architecture:
┌─────────────┐
│ Application │
└──────┬──────┘
▼
┌─────────────────────────┐
│ DAX Cluster │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │Cache│ │Cache│ │Cache│ │
│ └─────┘ └─────┘ └─────┘ │
└──────────┬──────────────┘
▼
┌─────────────────────────┐
│ Amazon DynamoDB │
│ ┌───┐ ┌───┐ ┌───┐ │
│ │Tbl│ │Tbl│ │Tbl│ │
│ └───┘ └───┘ └───┘ │
└─────────────────────────┘DAX vs ElastiCache:
| Use Case | Solution |
|---|---|
| Cache individual objects, Query/Scan results | DAX |
| Store aggregation results (computed data) | ElastiCache |
Application
│
├── Aggregation Results ──────► ElastiCache
│
└── Individual objects ───────► DAX ──► DynamoDB
Query & Scan cache⚠️ Exam trap: “Cache DynamoDB reads” → DAX. “Store computed/aggregated results” → ElastiCache.
⚠️ Exam trap: “ProvisionedThroughputExceededException” + “hot keys/popular items” → DAX. Caches hot keys, offloads reads, prevents throughput errors. Increasing RCU alone won’t fix hot partition problem.
⚠️ Exam trap: “Migrate to Aurora/RDS” vs “Add DAX” → Choose DAX. Migration = dev effort, downtime risk, loses serverless benefits. DAX = no code changes, immediate fix, stays serverless.
What are Streams?
Use Cases:
⚠️ Exam trap: “React to DynamoDB changes” (e.g., “send email when user signs up”) → DynamoDB Streams + Lambda. Never poll/scan — use event-driven streams.
DynamoDB Streams Architecture:
App ──► Table ──► DynamoDB Streams ──┬──► Lambda/KCL ──► SNS (notifications)
│ │ ──► DDB Table (filtering)
│ │
▼ │
Kinesis Data ────┴──► Kinesis Firehose ──► S3 (archiving)
Streams ──► Redshift (analytics)
──► OpenSearch (indexing)DynamoDB Streams vs Kinesis Data Streams:
| Feature | DynamoDB Streams | Kinesis Data Streams |
|---|---|---|
| Retention | 24 hours | 1 year |
| Consumers | Limited (2 simultaneous) | High # of consumers |
| Processing | Lambda Triggers, KCL Adapter | Lambda, Analytics, Firehose, Glue |
| Ordering | Per-item ordered | Per-shard ordered |
| Cost | Included (no extra charge) | Pay for shards |
When to Use Which:
| Scenario | Best Choice |
|---|---|
| Simple Lambda trigger on DDB changes | DynamoDB Streams |
| Need > 2 consumers reading same stream | Kinesis Data Streams |
| Retention > 24 hours needed | Kinesis Data Streams |
| Archive to S3/Redshift/OpenSearch | Kinesis Data Streams → Firehose |
| Real-time analytics on changes | Kinesis Data Streams → Analytics |
| Just trigger notifications/updates | DynamoDB Streams → Lambda |
⚠️ Exam trap: “Multiple consumers” or “long retention” or “replay” or “analytics pipeline” or “GB/sec real-time” → Kinesis Data Streams. “Simple Lambda trigger” → DynamoDB Streams. SQS/SNS have no replay.
What are Global Tables?
GLOBAL TABLE
┌─────────────────────────────────────────┐
│ │
│ ┌──────────┐ two-way ┌──────────┐ │
│ │ Table │◄────────►│ Table │ │
│ │US-EAST-1 │replication│AP-SE-2 │ │
│ └──────────┘ └──────────┘ │
│ Read+Write Read+Write │
└─────────────────────────────────────────┘Key Points:
Global Tables vs RDS Read Replicas:
| Aspect | DynamoDB Global Tables | RDS Read Replicas |
|---|---|---|
| Write | Any region (active-active) | Primary only |
| Read | Any region | Any replica |
| Replication | Two-way (bi-directional) | One-way (primary → replica) |
| Use case | Global apps, DR | Read scaling |
⚠️ Exam trap: “Low latency global access to DynamoDB” → Global Tables. Requires Streams enabled (Streams provide changelog for replication). Not DAX (caching), not Backups (recovery), not “Versioning” (doesn’t exist). ⚠️ Exam trap: Global Tables = active-active (write anywhere). RDS Read Replicas = active-passive (write to primary only).
Time To Live (TTL):
⚠️ Exam trap: “Web session handling” + “auto-expire” → DynamoDB with TTL. Sessions stored in DynamoDB, TTL auto-cleans expired sessions.
Backups:
| Type | Details |
|---|---|
| PITR (Point-in-Time Recovery) | Last 35 days, continuous, creates new table |
| On-Demand | Manual, long-term retention, no performance impact |
| AWS Backup | Cross-region copy support |
S3 Integration:
| Operation | Details |
|---|---|
| Export to S3 | Requires PITR, last 35 days, DynamoDB JSON or ION format, no RCU consumed |
| Import from S3 | CSV/JSON/ION, creates new table, no write capacity consumed |
⚠️ Exam trap: “Export DynamoDB for analytics” → Export to S3 (native feature). Not Lambda — Export uses PITR backup, no RCU, no code. Transfer Family/DataSync are for files, not databases.
Overview:
Features:
Integrations:
| Integration Type | Use Case | Example |
|---|---|---|
| Lambda | Serverless backend | REST API → Lambda |
| HTTP | Existing HTTP endpoints | On-prem API, ALB |
| AWS Service | Direct AWS API exposure | Start Step Function, post to SQS |
⚠️ Exam trap: “Serverless REST API” → API Gateway + Lambda. Why others fail:
API Gateway → Kinesis Data Streams Example:
Client ──► API Gateway ──► Kinesis Data ──► Kinesis Data ──► S3
(requests) Streams Firehose (.json files)Endpoint Types:
| Type | Description | CloudFront |
|---|---|---|
| Edge-Optimized (default) | Global clients, routed via CloudFront edge | Built-in |
| Regional | Same-region clients | Optional (manual) |
| Private | VPC only, via Interface Endpoint (ENI) | N/A |
⚠️ Exam trap: “Edge-Optimized API Gateway lives in all regions” = False. Requests route through global CloudFront edges, but API Gateway itself stays in ONE region.
Security:
| Method | Use Case |
|---|---|
| IAM Roles | Internal applications |
| Cognito | External users (mobile apps) |
| Custom Authorizer | Your own auth logic (Lambda) |
API Gateway Limits:
| Limit | Value |
|---|---|
| Throttling | 10,000 req/sec (account level, can increase) |
| Burst | 5,000 concurrent requests |
| Timeout | 29 seconds max (Lambda can run 15 min, but API GW times out at 29s) |
| Payload | 10 MB max |
⚠️ Exam trap: “API Gateway timeout” = 29 seconds (not Lambda’s 15 min). Long-running → use async pattern (API GW → SQS → Lambda).
HTTPS/Certificates:
⚠️ Exam trap: “API Gateway + global users” → Edge-Optimized. Certificate must be in us-east-1.
Overview:
Use Cases:
Step Functions Workflow Types:
| Type | Duration | Execution | Pricing | Use Case |
|---|---|---|---|---|
| Standard | Up to 1 year | Exactly-once | Per state transition | Long-running, audit |
| Express | Up to 5 min | At-least-once | Per execution + duration | High-volume, short |
⚠️ Exam trap: “Serverless workflow” + “human approval” → Step Functions. Only service with built-in human approval feature.
⚠️ Exam trap: “High-volume, short-lived workflows” → Express Workflows. “Long-running, exactly-once” → Standard Workflows.
Overview:
Cognito vs IAM:
| Aspect | Cognito | IAM |
|---|---|---|
| Users | Hundreds/thousands/millions | Handful (employees, services) |
| Type | External users (customers) | Internal users (admins, devs) |
| Scale | Web/mobile app users | AWS account management |
| Federation | SAML, social (Google, FB) | SAML, OIDC (for roles) |
⚠️ Exam trap keywords → Cognito:
Two Components:
| Component | Purpose | Key Feature |
|---|---|---|
| User Pools (CUP) | Authentication (sign-in) | Serverless user database |
| Identity Pools | Authorization (AWS credentials) | Temporary AWS access |
Features:
⚠️ Exam trap: “Easiest/best way to add authentication” to serverless app → Cognito User Pools. Not DynamoDB/S3 + KMS (DIY auth = complex), not Secrets Manager (for app secrets, not user auth).
Integrations:
[CUP + API Gateway] [CUP + ALB]
Cognito User Pools Cognito User Pools
(authenticate, get token) (authenticate)
▲ ▲
│ │
▼ ▼
User ──► API Gateway ──► Lambda User ─────► ALB ──► Target Group
(REST API + token) (authenticate)
(evaluate Cognito token)⚠️ Exam trap: CUP integrates with API Gateway and ALB for authentication.
Purpose:
Identity Sources:
Cognito Identity Pools Flow:
Web/Mobile App ──► Identity Provider ──► Cognito Identity Pools ──► AWS Services
(Google, Facebook, (validate, exchange (S3, DynamoDB)
SAML, CUP) for AWS credentials)
│
IAM policies define
what user can accessKey Points:
${cognito-identity.amazonaws.com:sub}) for per-user S3 folders⚠️ Exam trap: “Mobile app needs direct access to S3/DynamoDB” → Cognito Identity Pools (provides temporary AWS credentials).
⚠️ Exam trap: “Per-user personal space in S3” → Cognito Identity Pools + IAM policy variables. Not IAM users (doesn’t scale), not public bucket (no security).
⚠️ Exam trap: User Pools = WHO you are (authentication). Identity Pools = WHAT you can access (authorization/credentials).
Requirements → Solution Mapping:
| Requirement | AWS Solution |
|---|---|
| REST API with HTTPS | API Gateway |
| Serverless architecture | Lambda, DynamoDB, Cognito, S3 |
| Users interact with own S3 folder | Cognito Identity Pools (per-user IAM policy) |
| Managed serverless authentication | Cognito User Pools |
| Mostly reads, some writes | DAX (caching layer for read throughput) |
| Database scales, high read throughput | DynamoDB + DAX |
Complete Architecture:
┌──────────┐
Store/retrieve │ S3 │
files ──────────►│ (files) │
│ └──────────┘
Permissions
(Cognito)
│
┌────────────┐ REST HTTPS ┌─────────────┐ ┌────────┐ ┌─────┐ ┌──────────┐
│ Mobile │◄────────────────►│ API Gateway │─────►│ Lambda │────►│ DAX │────►│ DynamoDB │
│ Client │ │ (caching) │ │ │ │cache│ │ │
└────────────┘ └──────┬──────┘ └────────┘ └─────┘ └──────────┘
│ │
│ authenticate │ verify auth
▼ ▼
┌─────────────────┐
│ Amazon Cognito │
│ (User Pools + │
│ Identity Pools)│
└─────────────────┘Why each service:
⚠️ Exam trap: “Read-heavy workload” + “DynamoDB” → add DAX for caching. “Per-user S3 access” → Cognito Identity Pools.
Requirements → Solution Mapping:
| Requirement | AWS Solution |
|---|---|
| Scale globally | CloudFront (CDN, edge locations) |
| Rarely written, often read | DynamoDB + DAX (caching) |
| Static files | S3 + CloudFront |
| Dynamic REST API | API Gateway + Lambda |
| Caching where possible | CloudFront (static) + DAX (DB reads) |
| Welcome email on signup | DynamoDB Streams + Lambda + SES |
| Thumbnail on photo upload | S3 trigger + Lambda |
Architecture Overview:
STATIC CONTENT (Global):
OAC: Origin Access Control
Client ◄───────► CloudFront ◄──────────────────► S3 (static files)
(edge locations) Bucket policy: only CloudFront
DYNAMIC API:
Client ◄──REST──► API Gateway ──► Lambda ──► DAX ──► DynamoDB
cache
PHOTO UPLOAD + THUMBNAIL:
Client ──► CloudFront ──► S3 (photos) ──► Lambda (trigger) ──► S3 (thumbnails)
(Transfer Acceleration) OAC │
▼ optional
SQS / SNS
WELCOME EMAIL:
DynamoDB ──► DynamoDB Streams ──► Lambda ──► SES (send email)
(new user) (trigger)Key Patterns:
| Pattern | Implementation |
|---|---|
| Static hosting | S3 + CloudFront + OAC (bucket only allows CloudFront) |
| Global distribution | CloudFront edge locations |
| Read-heavy DB | DynamoDB + DAX caching |
| Event-driven processing | S3 trigger → Lambda (thumbnails) |
| React to DB changes | DynamoDB Streams → Lambda (welcome email) |
| Fast uploads | CloudFront + S3 Transfer Acceleration |
OAC (Origin Access Control):
⚠️ Exam trap: “Static website + global” → S3 + CloudFront. “Secure S3 from direct access” → OAC (Origin Access Control).
⚠️ Exam trap: “Generate thumbnail on upload” → S3 event → Lambda. “Welcome email on signup” → DynamoDB Streams → Lambda → SES.
Summary — Serverless Website Key Points:
| Component | Purpose |
|---|---|
| CloudFront + S3 | Static content distribution |
| API Gateway + Lambda | Serverless REST API (public, no Cognito needed) |
| DynamoDB Global Tables | Global data serving (alternative: Aurora Global) |
| DynamoDB Streams → Lambda | React to DB changes (new user → welcome email) |
| Lambda + SES | Serverless email sending (Lambda needs IAM role for SES) |
| S3 Events | Trigger SQS / SNS / Lambda on upload |
⚠️ Exam trap: “Public API” → no Cognito needed, just API Gateway + Lambda. “Global database” → DynamoDB Global Tables or Aurora Global Database.
Why Microservices?
Communication Patterns:
| Pattern | Services | Use Case |
|---|---|---|
| Synchronous | API Gateway, Load Balancers | Direct request/response |
| Asynchronous | SQS, Kinesis, SNS, Lambda triggers (S3) | Decoupled, event-driven |
Architecture Example:
Route 53 (DNS)
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
service1.example.com service2.example.com service3.example.com
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ ELB │ │ API Gateway │ │ ELB │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ ECS │ │ Lambda │ │ EC2 + ASG │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ DynamoDB │ │ ElastiCache │ │ RDS │
└─────────────┘ └─────────────┘ └─────────────┘Microservices Challenges:
| Challenge | Description |
|---|---|
| Repeated overhead | Creating each new microservice requires setup |
| Server utilization | Hard to optimize density across services |
| Version complexity | Running multiple versions simultaneously |
| Client SDK proliferation | Clients need to integrate with many services |
How Serverless Helps:
| Challenge | Serverless Solution |
|---|---|
| Overhead | API Gateway + Lambda = minimal setup |
| Scaling | Automatic scaling, pay per usage |
| Environments | Clone API, reproduce environments easily |
| Client SDKs | Generate SDK through Swagger/OpenAPI integration |
⚠️ Exam trap: “Reduce microservices overhead” → API Gateway + Lambda. “Generate client SDK” → API Gateway + Swagger/OpenAPI.
Problem:
Solution: Add CloudFront
BEFORE (expensive):
Users ──► EC2 (ASG) ──► distributes updates
scales up high CPU, network cost
AFTER (optimized):
┌────────────────────────────────┐
│ Auto Scaling group │
│ ┌────────────────────────┐ │
│ │ Availability Zone 1 │ │
│ │ ┌────┐ ┌────┐ │ │
│ │ │ M5 │ │ M5 │ │ │
Users ──► CloudFront ──► ALB ────────┼──┤ └────┘ └────┘ │ │──► EFS
(edge cache) (AZ 1-3) │ ├────────────────────────┤ │ (shared storage)
handles load │ │ Availability Zone 2 │ │
│ │ ┌────┐ ┌────┐ │ │
│ │ │ M5 │ │ M5 │ │ │
│ │ └────┘ └────┘ │ │
│ ├────────────────────────┤ │
│ │ Availability Zone 3 │ │
│ │ ┌────┐ │ │
│ │ │ M5 │ │ │
│ │ └────┘ │ │
│ └────────────────────────┘ │
└────────────────────────────────┘Why CloudFront Works:
| Benefit | Explanation |
|---|---|
| No architecture changes | Just add CloudFront in front |
| Edge caching | Software files cached globally |
| Static content | Update files don’t change = perfect for CDN |
| EC2 not serverless, CloudFront is | CloudFront scales automatically |
| Cost savings | Less ASG scaling, less EC2, less bandwidth |
EFS for Multi-AZ Shared Storage:
⚠️ Exam trap: “Reduce EC2 load for static file distribution” + “no architecture changes” → CloudFront. Works with existing EC2, caches at edge, reduces origin load. ALB has no caching feature.
⚠️ Exam trap: “Multi-AZ EC2 + shared filesystem” → EFS. Not EBS (single AZ), not S3 (object storage, not filesystem).
Serverless means AWS manages infrastructure. You don’t provision, patch, or scale servers.
Derive: If question asks “provision instance” for DynamoDB/Lambda → Wrong answer.
Lambda is NOT suitable for every workload. Hard limits exist:
Derive: Any job > 15 min, > 10 GB RAM/disk, or needs GPU → Lambda is wrong. AWS Batch is the natural alternative for batch workloads.
Lambda functions are stateless. Each invocation is independent.
Derive: “Cache results between invocations” → needs external cache (DAX, ElastiCache).
Derive: “Lambda can’t write to SQS” → IAM role issue, not security groups. SQS has no SG.
| Pattern | Behavior | Retry | Use Case |
|---|---|---|---|
| Sync | Wait for response | No auto-retry | API calls, user-facing |
| Async | Fire and forget | Auto-retry 6 hrs | Background jobs, events |
Derive: “Retry on failure” → Async invocation or SQS. “Immediate response” → Sync.
| Service | Retention | Replay |
|---|---|---|
| SNS | None (push & forget) | ❌ |
| SQS | 14 days max | ❌ (deleted after read) |
| Kinesis | 1-365 days | ✅ Multiple consumers |
| DynamoDB Streams | 24 hours | ✅ |
Derive: “Pause for a day, resume later” → SQS. “Replay events” → Kinesis. “Multiple consumers” → Kinesis.
Derive: “Auth at edge before reaching origin” → Lambda@Edge. “Millions req/sec simple” → CloudFront Functions.
User → CloudFront (edge cache) → API Gateway (response cache) → Lambda → DAX → DynamoDBEach layer reduces load on the next. Know which service provides which cache.
React to changes without polling:
Derive: “Send email when user signs up” → DynamoDB Streams + Lambda + SES. Not polling.
| Cognito Component | What It Does |
|---|---|
| User Pools | WHO you are (authentication, tokens) |
| Identity Pools | WHAT you can access (temporary AWS creds) |
Derive: “Mobile app login” → User Pools. “Direct S3 access from mobile” → Identity Pools.
"Serverless" mentioned?
├── REST API → API Gateway + Lambda
├── Database → DynamoDB (NoSQL) or Aurora Serverless (SQL)
├── Workflow → Step Functions
├── Auth → Cognito
└── Containers → Fargate
"Long-running job" (> 15 min)?
├── Batch workload → AWS Batch (Docker, Spot, no time limit)
├── Always-on service → ECS/Fargate
├── Custom/legacy → EC2
└── < 15 min → Lambda OK
"Cold start" problem?
├── Java/Python/.NET → SnapStart (free)
└── All languages → Provisioned Concurrency (costs $)
"Cache DynamoDB reads"?
├── Individual objects → DAX
└── Aggregated/computed → ElastiCache
"Global distribution"?
├── Static content → S3 + CloudFront
├── Dynamic API → API Gateway (Edge-Optimized) + Lambda
├── Database → DynamoDB Global Tables or Aurora Global
"React to changes"?
├── S3 upload → S3 Event → Lambda
├── DynamoDB insert → DynamoDB Streams → Lambda
├── Multiple consumers/replay → Kinesis
"Per-user S3 folders"?
└── Cognito Identity Pools + IAM policy variablesThe CANNOT List:
| What | Why |
|---|---|
| Lambda > 15 min | Hard limit |
| Lambda > 10 GB RAM | Hard limit |
| Lambda > 10 GB disk | Hard limit |
| Lambda GPU | Not supported → AWS Batch |
| Lambda arbitrary Docker | Must implement Runtime API |
| API Gateway > 29 sec | Timeout limit |
| DynamoDB change LSI after creation | LSI defined at table creation |
| SNS message replay | No retention |
| SQS message replay | Deleted after processing |
| ALB caching | ALB has no cache |
| SQS security groups | API-based, no network access control |
Keywords: video, encoding, > 15 min, long-running Answer: SQS + EC2 (or ECS/Batch) Why: Lambda max 15 min. SQS provides retry + retention up to 14 days.
Keywords: welcome email, new user, react to signup Answer: DynamoDB Streams → Lambda → SES Why: Event-driven. Streams capture new items, Lambda sends email via SES.
Keywords: thumbnail, upload, S3, image processing Answer: S3 Event → Lambda → S3 (thumbnails) Why: S3 triggers Lambda on PutObject. Lambda processes and saves.
Keywords: static files, reduce load, no architecture changes Answer: CloudFront (CDN) Why: Edge caching, no origin changes needed. ALB has no caching.
Keywords: mobile, direct access, temporary credentials Answer: Cognito Identity Pools Why: Provides temporary AWS credentials with IAM policies.
Keywords: per-user, personal space, S3 folders
Answer: Cognito Identity Pools + IAM policy variables
Why: ${cognito-identity.amazonaws.com:sub} in policy restricts to user’s folder.
Keywords: read-heavy, DynamoDB, cache, hot keys Answer: DAX Why: In-memory cache, microsecond latency, no code changes.
Keywords: throughput exceeded, hot partition, popular items Answer: DAX Why: Caches hot keys, offloads reads. RCU increase alone doesn’t fix hot partition.
Keywords: unpredictable, millions, instant scaling, spikes Answer: DynamoDB On-Demand mode Why: Instant scaling. Provisioned auto-scaling is gradual.
Keywords: global, multi-region, low latency, DynamoDB Answer: DynamoDB Global Tables Why: Active-active replication. Requires Streams enabled.
Keywords: human approval, manual step, workflow Answer: Step Functions Why: Built-in human approval feature. No other service has it.
Keywords: client SDK, API, mobile/web developers Answer: API Gateway + Swagger/OpenAPI Why: API Gateway generates SDKs from OpenAPI specs.
Keywords: edge authentication, before origin, CDN auth Answer: Lambda@Edge Why: Runs at edge, can validate JWT/tokens before hitting origin.
Keywords: multiple consumers, replay, analytics pipeline Answer: Kinesis Data Streams Why: Multiple consumers, 1-365 day retention, replay capability.
Keywords: millions, simple, headers, URL rewrite Answer: CloudFront Functions Why: Sub-millisecond, JavaScript only, cheaper than Lambda@Edge.
Keywords: batch, long-running, hours, cost-effective, Spot Answer: AWS Batch with Spot Instances Why: No time limit, Docker flexibility, up to 90% savings with Spot.
Keywords: video, transcoding, encoding, media processing Answer: AWS Batch (or Elastic Transcoder/MediaConvert) Why: Variable duration (could be hours), Docker flexibility, Spot for cost.
Lambda Limits:
| Limit | Value |
|---|---|
| Timeout | 15 min (900 sec) |
| RAM | 128 MB - 10 GB |
| /tmp disk | 512 MB - 10 GB |
| Deployment (zip) | 50 MB compressed, 250 MB uncompressed |
| Concurrency | 1000 default (regional) |
| Layers | 5 per function |
AWS Batch Capabilities (vs Lambda):
| Capability | AWS Batch | Lambda |
|---|---|---|
| Time limit | Unlimited | 15 min |
| RAM | 100s of GB | 10 GB |
| Disk | EBS (TBs) | 10 GB |
| GPU | ✅ Yes | ❌ No |
| Spot pricing | ✅ Yes (90% savings) | ❌ No |
| Docker | Any image | Runtime API required |
| Startup time | Minutes | Milliseconds |
API Gateway Limits:
| Limit | Value |
|---|---|
| Timeout | 29 seconds |
| Throttle | 10,000 req/sec (account) |
| Payload | 10 MB |
DynamoDB Numbers:
| Metric | Value |
|---|---|
| Item size max | 400 KB |
| Streams retention | 24 hours |
| On-Demand cost | 2-3x Provisioned |
| DAX TTL default | 5 minutes |
| PITR window | 35 days |
Retention Comparison:
| Service | Retention |
|---|---|
| SNS | 0 (immediate delivery) |
| SQS | 1 min - 14 days |
| Kinesis | 1 - 365 days |
| DynamoDB Streams | 24 hours |
| Question Contains | → Instant Answer |
|---|---|
| “Serverless REST API” | API Gateway + Lambda |
| “Job > 15 minutes” | NOT Lambda → EC2/ECS/Batch |
| “Cold start” | SnapStart or Provisioned Concurrency |
| “Cache DynamoDB” | DAX |
| “Cache aggregated results” | ElastiCache |
| “React to DynamoDB changes” | DynamoDB Streams + Lambda |
| “React to S3 upload” | S3 Event + Lambda |
| “Global static website” | S3 + CloudFront |
| “Global DynamoDB” | Global Tables (needs Streams) |
| “Send email serverless” | Lambda + SES |
| “Per-user S3 folders” | Cognito Identity Pools |
| “Mobile app auth” | Cognito User Pools |
| “Mobile direct AWS access” | Cognito Identity Pools |
| “Workflow with human approval” | Step Functions |
| “Generate client SDK” | API Gateway + Swagger |
| “Edge authentication” | Lambda@Edge |
| “Millions req/sec simple” | CloudFront Functions |
| “Multiple stream consumers” | Kinesis Data Streams |
| “Replay events” | Kinesis Data Streams |
| “Pause/resume days later” | SQS (14 day retention) |
| “Reduce EC2 load, no changes” | CloudFront |
| “Share code between Lambdas” | Lambda Layers |
| “Route Lambda success/failure” | Lambda Destinations |
| “High-volume short workflows” | Step Functions Express |
| “Long audit workflows” | Step Functions Standard |
| “Query by different attribute” | DynamoDB GSI |
| “Steep instant scaling” | DynamoDB On-Demand |
| “Predictable steady load” | DynamoDB Provisioned |
| “Lambda timeout 3 sec” | Default not changed |
| “Lambda can’t reach RDS” | Configure Lambda in VPC |
| “Lambda can’t write to SQS” | IAM Role missing permissions |
| “Long-running batch job” | AWS Batch |
| “Cost-optimize batch processing” | AWS Batch + Spot |
| “GPU required” | AWS Batch (not Lambda) |
| “> 10 GB RAM/disk” | AWS Batch (not Lambda) |
| “Video/media transcoding” | AWS Batch or MediaConvert |
| “ETL, data processing hours” | AWS Batch |
□ Does it need > 15 min execution?
→ Yes = Eliminate Lambda → AWS Batch preferred for batch jobs
→ No = Lambda possible
□ Does it need > 10 GB RAM or disk?
→ Yes = Eliminate Lambda → AWS Batch
→ No = Lambda possible
□ Does it need GPU?
→ Yes = Eliminate Lambda → AWS Batch (GPU instances)
□ Is it "serverless REST API"?
→ API Gateway + Lambda (not ALB+EC2)
□ Does it mention "cache"?
→ DynamoDB reads = DAX
→ Aggregated data = ElastiCache
→ Static content = CloudFront
→ API responses = API Gateway caching
□ Does it mention "global"?
→ Static = CloudFront
→ Database = Global Tables / Aurora Global
□ Does it need "replay" or "multiple consumers"?
→ Kinesis (not SQS/SNS)
□ Does it mention "edge"?
→ Simple/fast = CloudFront Functions
→ Complex/network = Lambda@Edge
□ "Security group" for SQS/SNS/DynamoDB?
→ Wrong answer (API services, not network)
□ "Provision instance" for DynamoDB/Lambda?
→ Wrong answer (serverless)Amazon Lightsail simplified alternative version of AWS services, used for simple web applications (has templates for LAMP, Nginx, MEAN, Node.js..), websites (templates for Wordpress, Magento, Plesk, Joomla), Dev/Test environment. Has high availability but no auto-scaling, limited AWS integrations.
Pricing Models in AWS:
Examples of spending categories:
Free services & free tier in AWS:
EC2 Instances Purchasing Options:
open, active, disabled, cancelled; must cancel request first, then terminate instances;lowestPrice (cost optimization, short workload), diversified (across all pools, availability, long workload), capacityOptimized (optimal capacity), priceCapacityOptimized (highest capacity then lowest price, recommended for most);EC2 Image Builder only pay for the underlying resources.
EBS Storage billed:
EFS (Elastic File System):
S3 Pricing:
ECS pricing:
Lambda pricing:
Snowball Family Pricing: AWS Snowball offers significantly discounted pricing (up to 62%) for 1-year usage and 3-year usage commitments for Edge compute use cases.
Database pricing - RDS:
CloudFront pricing:
Pricing Calculator: estimate the cost for your solution architecture.
AWS Billing Dashboard: home page for an overview of your AWS cloud financial management data and to help you make faster and more informed decisions. AWS Free Tier Dashboard: tracking AWS Free Tier usage.
Cost Allocation Tags: use cost allocation tags to track AWS costs on a detailed level.
Tagging and Resource Groups:
Cost and Usage Reports: lists AWS usage for each service category used by an account and its IAM users in hourly or daily line items, as well as any tags that customer activated for cost allocation purposes, including additional metadata about AWS services, pricing and reservations.
Cost Explorer: visualize, understand, and manage your AWS costs and usage over time. Create custom reports that analyze cost and usage data.
Billing Alarms in CloudWatch: intended simple alarm for actual cost, not for projected costs, based on billing data metric stored in CloudWatch.
Create billing alert for free tier (Details):
AWS Budgets: set custom budgets to track your costs and usage, and respond quickly to alerts received from email or SNS notifications if you exceed your threshold.
AWS Cost Anomaly Detection: continuously monitor your cost and usage using ML to detect unusual spends. It learns your unique, historic spend patterns to detect one-time cost spike and/or continuous cost increases — no need to define thresholds (ML does it). Monitor by: AWS services, member accounts, cost allocation tags, or cost categories. Sends anomaly detection report with root-cause analysis. Get notified with individual alerts or daily/weekly summary via SNS.
AWS Service Quotas: notifies you when you’re close to a service quota value threshold. Create CloudWatch Alarms. Request a quota increase from AWS Service Quotas or shutdown resources before limit is reached.
AWS Trusted Advisor: analyze your AWS accounts and provides recommendation on 6 categories:
AWS Compute Optimizer: uses ML to analyze existing resources’ configurations and their utilization CloudWatch metrics, helps to choose optimal configurations and right-size your workloads (over/under provisioned). Supports: EC2 Instances, EC2 Auto Scaling Groups, EBS volumes, Lambda functions. Recomendations can be exported to S3.
⚠️ Exam trap — Cost Explorer vs Compute Optimizer:
AWS Basic Support (free):
AWS Developer Support Plan:
AWS Business Support Plan (24/7):
AWS Enterprise On-Ramp Support Plan (24/7):
AWS Enterprise Support Plan (24/7):
Free tier gets 7 core checks only. Full checks require Business or Enterprise support plan. Categories: Cost optimization, Performance, Security, Fault tolerance, Service limits, Operational Excellence.
Basic (free) → Developer → Business → Enterprise On-Ramp → Enterprise. Key differentiators: response time, TAM access, Trusted Advisor access. Business = first plan with 24/7 phone/chat + full Trusted Advisor + API access.
Cost Allocation Tags → track costs per project/team/environment. Resource Groups → view resources sharing common tags. Without tags, you can’t do granular cost analysis.
What cost question are you answering?
│
├─ "Estimate cost BEFORE building" → Pricing Calculator
├─ "Visualize/analyze PAST costs" → Cost Explorer
├─ "Set budget ALERTS" → AWS Budgets
├─ "Detect UNUSUAL spending (ML)" → Cost Anomaly Detection
├─ "Detailed cost AUDIT report" → Cost and Usage Reports
├─ "Right-size resources" → Compute Optimizer
└─ "Check best practices" → Trusted AdvisorWhich Support Plan?
│
├─ "Just documentation + forums" → Basic (free)
├─ "Email support, business hours" → Developer
├─ "24/7 phone + full Trusted Advisor" → Business
├─ "Pool of TAMs, <30 min critical" → Enterprise On-Ramp
└─ "Designated TAM, <15 min critical" → EnterpriseKeywords: unusual spending, ML, automatic detection Answer: AWS Cost Anomaly Detection Why: ML learns patterns — no manual thresholds. Sends root-cause analysis via SNS.
Keywords: budget, threshold, alert, notification Answer: AWS Budgets Why: Budgets support cost/usage/reservation thresholds with email/SNS alerts.
Keywords: forecast, predict, future cost Answer: Cost Explorer (forecast feature)
Keywords: over-provisioned, under-utilized, right-size Answer: AWS Compute Optimizer
Keywords: production workloads, 24/7, phone support Answer: Business Support Plan (minimum for this)
Keywords: TAM, Technical Account Manager, designated Answer: Enterprise Support Plan (On-Ramp has a pool, not designated)
| Tool | Purpose | Trigger |
|---|---|---|
| Pricing Calculator | Estimate cost before building | Manual |
| Cost Explorer | Visualize past costs, forecast 12mo | On-demand |
| AWS Budgets | Alert when approaching/exceeding threshold | Threshold-based |
| Cost Anomaly Detection | ML detects unusual spending | Automatic (ML) |
| Cost & Usage Reports | Detailed line-item audit | Scheduled |
| Compute Optimizer | Right-size recommendations | ML analysis |
| Trusted Advisor | Best practice checks (6 categories) | Continuous |
| Support Plan | Response (Critical) | Trusted Advisor | TAM |
|---|---|---|---|
| Basic | — | 7 core checks | ❌ |
| Developer | 12h (business hrs) | 7 core checks | ❌ |
| Business | <1h | Full + API | ❌ |
| Enterprise On-Ramp | <30 min | Full + API | Pool |
| Enterprise | <15 min | Full + API | Designated |
| Question Contains | → Instant Answer |
|---|---|
| “Estimate cost before building” | Pricing Calculator |
| “Visualize past costs” | Cost Explorer |
| “Forecast future spending” | Cost Explorer (12 mo) |
| “Set budget alert” | AWS Budgets |
| “Detect unusual spending (ML)” | Cost Anomaly Detection |
| “No thresholds, automatic detection” | Cost Anomaly Detection |
| “Detailed cost audit per service” | Cost & Usage Reports |
| “Right-size EC2/EBS/Lambda” | Compute Optimizer |
| “Best practices check” | Trusted Advisor |
| “Track costs by project/team” | Cost Allocation Tags |
| “24/7 phone support” | Business plan (minimum) |
| “Full Trusted Advisor + API” | Business plan (minimum) |
| “Designated TAM” | Enterprise plan |
| “Pool of TAMs” | Enterprise On-Ramp |
| “<15 min response critical” | Enterprise plan |
| “Service quota approaching limit” | Service Quotas |
| “Stop dev instances after hours” | Instance Scheduler |
□ Is it about ESTIMATING cost before building?
→ Yes = Pricing Calculator
→ No = analyzing existing costs
□ Is it about DETECTING unusual spending automatically?
→ Yes + no thresholds = Cost Anomaly Detection (ML)
→ Yes + specific threshold = AWS Budgets
□ Is it about VISUALIZING past costs or FORECASTING?
→ Visualize/forecast = Cost Explorer
→ Detailed line-item audit = Cost & Usage Reports
□ Do they need 24/7 PHONE support?
→ Yes = Business plan (minimum)
→ Email only = Developer plan
□ Do they need a TAM?
→ Designated = Enterprise
→ Pool = Enterprise On-Ramp
→ None = Business or lower
□ Do they need FULL Trusted Advisor?
→ Yes = Business plan (minimum)
→ 7 core checks only = Basic/Developer
□ Is it about RIGHT-SIZING resources?
→ Yes = Compute Optimizer
→ Cost visualization = Cost Explorer (different!)
□ Is it about TRACKING costs per project/team?
→ Yes = Cost Allocation Tags first
→ No tags = can't do granular trackingScalability means that an application or system can handle greater loads by adapting.
High Availability: survivability of a data center loss (disaster). Running application or system in at least two AZs.
Fault-tolerant systems emphasize maintaining continuous operation during unexpected failures, while high-availability infrastructures prioritize keeping services up and running despite scheduled maintenance or potential bottlenecks.
Scalability vs Elasticity vs Agility:
Elastic Load Balancer (ELB) - managed load balancer that automatically distributes incoming application traffic across multiple resources, such as Amazon EC2 instances.
Load Balancer Flows:
ALB Flow (Layer 7):
┌─────────────────┐
Internet │ ALB │
─────────────────────►│ SSL Termination│
HTTPS :443 └────────┬────────┘
│ HTTP :80
┌─────────────┼─────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ EC2 │ │ EC2 │ │ EC2 │
└─────────┘ └─────────┘ └─────────┘
Target Group
ALB Routing Rules:
┌──────────────────────────────────────┐
│ ALB Listener :443 (HTTPS+SNI) │
└──────────────────┬───────────────────┘
┌──────────────────────┼──────────────────────┐
│ /api/* │ /images/* │ default
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ API Servers │ │ S3/Lambda │ │ Web Servers │
└──────────────┘ └──────────────┘ └──────────────┘
NLB with Static IP (Layer 4):
Client (needs static IP for firewall whitelist)
│
▼
┌─────────────────────┐
│ NLB │
│ Elastic IP: 1.2.3.4│ ◄── Static IP per AZ
│ (Layer 4) │
└──────────┬──────────┘
│ TCP passthrough
│ (Client IP preserved)
▼
┌───────────────┐
│ Target Group │
└───────────────┘
GLB for Security Appliances:
┌──────────┐
Traffic ────────►│ GLB │
│(Layer 3) │
└────┬─────┘
│ GENEVE :6081
▼
┌─────────────────────┐
│ Security Appliance │
│ (Firewall/IDS/IPS) │
│ Inspect ───────►│──► Allow/Block
└─────────────────────┘
│
▼
Your ApplicationTypes of load balancers:
| Feature | ALB | NLB | GLB | CLB |
|---|---|---|---|---|
| Layer | 7 (HTTP/S) | 4 (TCP/UDP) | 3 (IP) | 4 & 7 |
| Use Case | Web apps, microservices | Ultra-low latency, static IP | Firewalls, IDS/IPS | Legacy (deprecated) |
| Performance | Moderate | Millions req/sec | High throughput | Moderate |
| Static IP | ❌ DNS only | ✅ Elastic IP per AZ | ❌ | ❌ |
| SNI (multi-cert) | ✅ | ✅ | N/A | ❌ |
| Cross-Zone Default | ✅ Enabled (free) | ❌ Disabled (paid) | ❌ Disabled (paid) | ❌ Disabled (free) |
| Hostname | XXX.region.elb.amazonaws.com | XXX.region.elb.amazonaws.com | XXX.region.elb.amazonaws.com | Fixed hostname |
Target Group Support:
| Target Type | ALB | NLB | GLB | CLB |
|---|---|---|---|---|
| EC2 Instances | ✅ | ✅ | ✅ | ✅ |
| IP Addresses (private) | ✅ | ✅ | ✅ | ❌ |
| Lambda Functions | ✅ (HTTP→JSON) | ❌ | ❌ | ❌ |
| ALB | ❌ | ✅ | ❌ | ❌ |
| ECS Tasks | ✅ | ✅ | ❌ | ✅ |
When to Use:
| Scenario | Choose |
|---|---|
| HTTP routing (path/host/headers/query string) | ALB |
| WebSockets, HTTP/2 | ALB |
| Containers with dynamic ports | ALB |
| Need static/Elastic IP (IP whitelisting) | NLB |
| Millions req/sec, ultra-low latency | NLB |
| TCP/UDP non-HTTP traffic | NLB |
| 3rd party security appliances | GLB |
| Deep packet inspection | GLB |
Details:
⚠️ Exam trap: ELB target registration — instance ID vs IP address
| Register By | Routing Behavior | Use Case |
|---|---|---|
| Instance ID | Routes to primary private IP on primary ENI | Default, simplest |
| IP Address | Routes to the specific IP you chose | Multiple IPs per instance, non-EC2 targets (on-prem, containers) |
❌ Public IP / Elastic IP → never used for target routing (ELB routes within VPC via private IPs)
❌ Instance ID as routable address → instance ID is a reference, NLB resolves it to primary private IP
Gateway Load Balancer [Layer 3 - IP Packets]: Deploy/scale 3rd party network virtual appliances;
Load Balancer Details:
| Feature | CLB | ALB | NLB | GLB |
|---|---|---|---|---|
| Layer | 4 & 7 (deprecated) | 7 (HTTP/S) | 4 (TCP/UDP) | 3 (IP) |
| Use Case | Legacy | Microservices, containers | Ultra-low latency, static IP | Firewalls, IDS/IPS |
| Routing | Basic | Path/host/headers/query | - | - |
| Target Groups | EC2 only | EC2, ECS, Lambda, IPs | EC2, IPs, ALB | EC2, IPs |
| Static IP | ❌ | ❌ DNS only | ✅ Elastic IP/AZ | ❌ |
| Health Checks | TCP, HTTP | HTTP, HTTPS | TCP, HTTP, HTTPS | TCP, HTTP, HTTPS |
| Dynamic Port Mapping | ❌ | ✅ | ❌ | ❌ |
| Client Info | Preserved | X-Forwarded-* headers | Preserved | Preserved |
| Protocol | TCP/HTTP | HTTP/HTTPS | TCP/UDP | GENEVE (port 6081) |
Sticky Sessions (Session Affinity): client always redirected to same instance behind load balancer.
Cookie Types:
Cross-Zone Load Balancing: distributes traffic evenly across all registered instances in all AZs (not just per-node).
| Load Balancer | Default | Inter-AZ Data Charges |
|---|---|---|
| ALB | ✅ Enabled | No charges |
| NLB & GLB | ❌ Disabled | Charges if enabled |
| CLB | ❌ Disabled | No charges |
SSL/TLS: encrypts traffic in transit (in-flight encryption) between clients and load balancer.
Load Balancer - SSL Certificates:
HTTP → HTTPS Redirect:
⚠️ Exam trap: DNS cannot redirect HTTP→HTTPS (DNS only resolves names to IPs, no protocol handling).
SNI (Server Name Indication): allows multiple SSL certs on one server (multiple websites).
users.example.com + checkout.example.com with different certs;⚠️ Exam traps:
| Load Balancer | SSL Certificates | SNI Support |
|---|---|---|
| CLB | 1 only (need multiple CLBs for multiple domains) | ❌ No |
| ALB | Multiple (via multiple listeners) | ✅ Yes |
| NLB | Multiple (via multiple listeners) | ✅ Yes |
Connection Draining / Deregistration Delay: time to complete in-flight requests while instance is de-registering or unhealthy.
Auto Scaling Group (ASG): ensures optimal capacity by automatically scaling EC2 instances.
Launch Template vs Launch Configuration:
| Feature | Launch Configuration (legacy) | Launch Template (recommended) |
|---|---|---|
| Multiple instance types | ❌ Single type only | ✅ Multiple types |
| Mixed On-Demand + Spot | ❌ | ✅ |
| Versioning | ❌ | ✅ |
| Capacity Reservations | ❌ | ✅ |
| Status | Legacy (deprecated) | Recommended |
⚠️ Exam trap: “Mix On-Demand + Spot across multiple instance types in ASG” → Launch Template only. Launch Configuration supports single instance type, single purchase option. AWS recommends Launch Templates for all new ASGs.
ASG + ALB Integration:
┌─────────────────────────────────────────────────┐
│ Auto Scaling Group │
│ ┌───────┐ ┌───────┐ ┌───────┐ │
│ │ EC2 │ │ EC2 │ │ EC2 │ ... │
│ └───┬───┘ └───┬───┘ └───┬───┘ │
│ └───────────┴───────────┘ │
│ ▲ Health Checks │
│ ALB ────────┘ │
│ │
│ Scale Out ◄── CloudWatch Alarm (CPU>70%) │
│ Scale In ◄── CloudWatch Alarm (CPU<30%) │
└─────────────────────────────────────────────────┘ASG Health Check Types:
| Type | What it checks | Status |
|---|---|---|
| EC2 | Instance running (hardware/hypervisor) | Always on |
| ELB | App responds on health endpoint | Optional (additive) |
Unhealthy instance behavior: ASG terminates instance → launches new one.
⚠️ Exam trap: ASG never “restarts the app” or “detaches and leaves running” - always terminates + replaces.
Auto Scaling Groups - Capacity characteristics:
Auto Scaling Groups - Scaling Strategies: Manual Scaling: Update the size of an ASG manually; Dynamic Scaling: Respond to changing demand: - Simple / Step Scaling: threshold-based, you define actions; - Example: CPU > 70% → add 2 units; CPU < 30% → remove 1 unit; - Target Tracking Scaling: “keep metric at X” - ASG auto-adjusts (like thermostat); - Example: avg 1000 connections/instance, 70% CPU, 50 requests/target; - Scheduled Scaling: time-based, for predictable patterns; - Example: scale to 10 instances every Monday 9am, scale down Friday 6pm; - Predictive Scaling: ML-based, proactive; - Uses Machine Learning to predict future traffic ahead of time;
Custom Metrics for Scaling:
⚠️ Exam trap: “Detailed Monitoring” only increases EC2 metric frequency (1min vs 5min) - does NOT add new metric types. For app-specific metrics like “DB requests/min” → Custom Metric required.
Why decouple? Synchronous communication can be problematic with sudden traffic spikes.
Application Communication Patterns:
1) Synchronous (app-to-app): 2) Asynchronous (app-to-queue-to-app):
┌─────────┐ ┌──────────┐ ┌─────────┐ ┌───────┐ ┌──────────┐
│ Buying │◀────▶│ Shipping │ │ Buying │───▶│ Queue │───▶│ Shipping │
│ Service │ │ Service │ │ Service │ │ │ │ Service │
└─────────┘ └──────────┘ └─────────┘ └───────┘ └──────────┘
(tight coupling) (decoupled, scales independently)Decoupling services:
These services scale independently from your application!
⚠️ Exam trap - “Services that buffer or throttle traffic spikes”:
Amazon SQS (Simple Queue Service) is fully managed, serverless messaging service that is used to decouple aplications. Send, store, and receive messages between software components, without losing messages or requiring other services to be available. In Amazon SQS, an application sends messages into a queue. A user or service retrieves a message from the queue, processes it, and then deletes it from the queue.
Amazon SNS is a fully managed, serverless, publish/subscribe notification service. Using Amazon SNS topics, a publisher publishes messages to subscribers:
Amazon MQ is a managed message broker service for RabbitMQ and ActiveMQ.
Amazon Kinesis is a managed service to collect, process and analyze real-time streaming data at any scale.
• Oldest offering (over 10 years old) • Fully managed service, used to decouple applications • Attributes: • Unlimited throughput, unlimited number of messages in queue • Default retention of messages: 4 days, maximum of 14 days • Low latency (<10 ms on publish and receive) • Limitation of 256KB per message sent • Can have duplicate messages (at least once delivery, occasionally) • Can have out of order messages (best effort ordering)
SQS Queue - Multiple Producers & Consumers:
┌──────────┐ ┌──────────┐
│ Producer │──┐ ┌────▶│ Consumer │
└──────────┘ │ │ └──────────┘
┌──────────┐ │ ┌───────────┐ │ ┌──────────┐
│ Producer │──┼───▶│ SQS Queue │────┼────▶│ Consumer │
└──────────┘ │ └───────────┘ │ └──────────┘
┌──────────┐ │ (Send messages) │ ┌──────────┐
│ Producer │──┘ └────▶│ Consumer │
└──────────┘ └──────────┘
(Poll messages)Producing Messages:
SendMessage API)Consuming Messages:
DeleteMessage APISQS Message Flow:
Poll/Receive Process
SQS Queue ─────────────────▶ Consumer ─────────────▶ RDS
▲ │
│ DeleteMessage │
└────────────────────────────┘Multiple Consumers (Horizontal Scaling):
SQS with Auto Scaling Group:
SendMessage ReceiveMessages
┌────────────┐ │ ┌───────────┐ │ ┌────────────┐
│ Front-end │────────┼───────▶│ SQS Queue │─────────┼───────▶│ Back-end │
│ Web App │ │ │(infinitely│ │ │ Processing │
│ (ASG) │ │ │ scalable) │ │ │ (ASG) │
└────────────┘ │ └─────┬─────┘ │ └────────────┘
│
┌────────────────┴────────────────┐
▼ ▼
CloudWatch Metric CloudWatch Alarm
(ApproximateNumberOfMessages) │
▼
Scale ASG⚠️ Exam trap: “Scale consumers based on queue depth” → use CloudWatch Alarm on ApproximateNumberOfMessages
At-Least-Once Delivery: SQS prioritizes never losing messages over exactly-once delivery. Duplicates can occur when:
Solution: Make consumers idempotent (processing same message twice = same result) or use FIFO queue (exactly-once)
⚠️ Exam trap: “Prevent duplicate processing in SQS” → use FIFO queue or idempotent consumers
[NEW INFO]
| Security Layer | Options |
|---|---|
| In-flight encryption | HTTPS API |
| At-rest encryption | KMS keys (SSE-KMS) |
| Client-side encryption | Customer manages encrypt/decrypt |
| Access Controls | IAM policies for SQS API access |
| SQS Access Policies | Resource-based (like S3 bucket policies) |
SQS Access Policies use cases:
⚠️ Exam trap: “Allow S3 to send notifications to SQS” → SQS Access Policy (not IAM policy)
| Polling Type | Behavior | API Calls | Cost |
|---|---|---|---|
| Short Polling | Returns immediately (even if empty) | High | Higher (pay per request!) |
| Long Polling | Waits up to 20 sec for messages | Low | Lower |
Why Long Polling saves $$$: SQS pricing = per 1 million requests. Short polling = constant empty responses = wasted money.
Enable Long Polling:
ReceiveMessageWaitTimeSecondsWaitTimeSeconds parameter⚠️ Exam trap: “Reduce SQS costs” or “reduce empty responses” → enable Long Polling
Visibility Timeout Timeline:
ReceiveMessage ReceiveMessage ReceiveMessage ReceiveMessage
│ │ │ │
▼ ▼ ▼ ▼
──────┬───────────────────────────────────────┬────────────────────────▶ Time
│ Visibility Timeout │
│◄─────────────────────────────────────▶│
│ │
Message Not returned Message returned
returned (invisible) (again!)| Visibility Timeout | Risk |
|---|---|
| Too low (seconds) | Duplicates (message reappears before processing done) |
| Too high (hours) | Long wait if consumer crashes (re-processing delayed) |
Extend timeout: Call ChangeMessageVisibility API to get more time
⚠️ Exam trap: “Consumer needs more time” → use ChangeMessageVisibility API
.fifoFIFO Queue Flow:
┌──────────┐ Send messages ┌─────────────┐ Poll messages ┌──────────┐
│ Producer │───────────────────▶│ FIFO Queue │──────────────────▶│ Consumer │
└──────────┘ [4][3][2][1] │ ▶|||◀ │ [4][3][2][1] └──────────┘
└─────────────┘
(same order in/out)| Feature | Standard Queue | FIFO Queue |
|---|---|---|
| Throughput | Unlimited | 300 msg/s (3000 with batching) |
| Ordering | Best effort | Guaranteed (by Message Group ID) |
| Delivery | At-least-once | Exactly-once (Deduplication ID) |
| Duplicates | Possible | Removed via Deduplication ID |
FIFO Required Parameters:
| Parameter | Type | Purpose | Example |
|---|---|---|---|
| Message Group ID | Tag | Groups messages for ordering. Same group = processed in order | customer_123 or order_456 |
| Message Deduplication ID | Token | Prevents duplicates. Same ID within 5 min = rejected | txn_789 or hash of message body |
Use Cases:
order_id so all updates for same order are processed in sequencepayment_id won’t be processed twice if retry happens⚠️ Exam trap: “Need ordering” → FIFO queue. “Need exactly-once” → FIFO queue with Deduplication ID
Problem: Direct writes to DB under heavy load → transactions lost
Without SQS (transactions may be lost):
requests ───▶ ┌─────────────┐ ─── Insert ───▶ ┌─────────────┐
│ Application │ transactions │ RDS / │
│ (ASG) │ ────────────────▶│ Aurora / │
└─────────────┘ │ DynamoDB │
│ └─────────────┘
(overwhelmed)With SQS Buffer (no data loss):
┌─────────────┐ ┌─────────────┐
requests ───▶ │ Enqueue App │ SendMessage │ Dequeue App │ insert
│ (ASG) │ ───────────────▶ │ (ASG) │ ──────────▶ DB
└─────────────┘ ┌───────┐ └─────────────┘
│ SQS │
│Queue │
│(buffer)│
└───────┘
(infinitely scalable)Use case: Protect database from write spikes, decouple producers from consumers
⚠️ Exam trap: “Database overwhelmed by writes” → use SQS as buffer
[NEW INFO]
Pub/Sub model: One message to many receivers (vs SQS point-to-point)
Direct Integration vs Pub/Sub:
Direct (tight coupling): Pub/Sub (decoupled):
┌─────────┐ ──▶ Email ┌─────────┐ ┌─────────────┐
│ Buying │ ──▶ Fraud Service │ Buying │──▶ SNS ─┼──▶ Email │
│ Service │ ──▶ Shipping │ Service │ Topic │──▶ Fraud │
│ │ ──▶ SQS Queue └─────────┘ │──▶ Shipping │
└─────────┘ (1 publish) │──▶ SQS Queue│
(4 integrations to maintain) └─────────────┘
(add subscribers easily)SNS Limits:
Subscribers: SQS, Lambda, Kinesis Data Firehose, Email, SMS, HTTP(S) endpoints
AWS Services → SNS (built-in integrations):
┌─────────────────────────────────────────────┐
│ CloudWatch Alarms │ AWS Budgets │ Lambda │ publish
│ ASG (Notifications)│ S3 (Events) │ DynamoDB│ ───────────▶ SNS
│ CloudFormation │ AWS DMS │ RDS │
│ (State Changes) │ (New Replica) │ Events │
└─────────────────────────────────────────────┘| Method | Use Case | Steps |
|---|---|---|
| Topic Publish (SDK) | Standard notifications | Create topic → Create subscription(s) → Publish |
| Direct Publish (Mobile SDK) | Mobile push | Create platform app → Create endpoint → Publish |
Mobile Push Platforms: Google GCM, Apple APNS, Amazon ADM
⚠️ Exam trap: “Send notification to multiple services at once” → SNS (not SQS)
Push once to SNS, receive in all SQS queues that are subscribers
SNS + SQS Fan Out:
┌─────────┐ ┌───────────┐ ┌───────────┐ ┌─────────────┐
│ Buying │────────▶│ SNS Topic │────────▶│ SQS Queue │────────▶│ Fraud │
│ Service │ └─────┬─────┘ └───────────┘ │ Service │
└─────────┘ │ └─────────────┘
│ ┌───────────┐ ┌─────────────┐
└──────────────▶│ SQS Queue │────────▶│ Shipping │
└───────────┘ │ Service │
└─────────────┘Benefits:
Required: SQS queue access policy must allow SNS to write
Need fan out + ordering + deduplication? Use SNS FIFO + SQS FIFO
SNS FIFO + SQS FIFO Fan Out:
┌─────────┐ ┌────────────┐ ┌────────────┐ ┌─────────┐
│ Buying │────────▶│ SNS FIFO │────────▶│ SQS FIFO │────────▶│ Fraud │
│ Service │ │ Topic │ │ Queue │ │ Service │
└─────────┘ └─────┬──────┘ └────────────┘ └─────────┘
│ ┌────────────┐ ┌─────────┐
└───────────────▶│ SQS FIFO │────────▶│Shipping │
│ Queue │ │ Service │
└────────────┘ └─────────┘SNS FIFO Topic:
JSON policy to filter messages per subscription (subscribers only get what they need)
SNS Message Filtering:
Message:
Order: 1036
┌─────────┐ Product: Pencil ┌─────────────────────────────────────┐
│ Buying │──────▶ State: Placed ──────▶ │ SNS Topic │
│ Service │ └──────────────┬──────────────────────┘
└─────────┘ │
│
┌─────────────────────────────────────┼─────────────────────┐
│ │ │
Filter: State=Placed Filter: State=Cancelled No Filter
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ SQS (Placed) │ │ SQS (Cancelled)│ │ SQS (All) │
└───────────────┘ └───────────────┘ └───────────────┘No filter policy = receives ALL messages
⚠️ Exam trap: “Route different message types to different queues” → SNS Filter Policy
Problem: S3 allows only one event rule per combination of event type + prefix
Solution: S3 → SNS → Fan out to multiple SQS queues
S3 Events Fan Out:
┌───────────┐ events ┌───────────┐ fan-out ┌───────────┐
│ S3 Object │───────────▶│ SNS Topic │────────────▶│ SQS Queue │
│ Created │ └─────┬─────┘ └───────────┘
└───────────┘ │ ┌───────────┐
├──────────────────▶│ SQS Queue │
│ └───────────┘
│ ┌───────────┐
└──────────────────▶│ Lambda │
└───────────┘⚠️ Exam trap: “S3 event to multiple destinations” → S3 → SNS → Fan out
SNS can send to Kinesis Data Firehose → then to any KDF destination
SNS → Kinesis Data Firehose → S3:
┌─────────┐ ┌───────────┐ ┌─────────────────┐ ┌────────┐
│ Buying │────────▶│ SNS Topic │────────▶│ Kinesis Data │────────▶│ S3 │
│ Service │ └───────────┘ │ Firehose │ └────────┘
└─────────┘ └─────────────────┘
(or any KDF destination)| Security Layer | Options |
|---|---|
| In-flight encryption | HTTPS API |
| At-rest encryption | KMS keys |
| Client-side encryption | Customer manages encrypt/decrypt |
| Access Controls | IAM policies for SNS API access |
| SNS Access Policies | Resource-based (like S3 bucket policies) |
SNS Access Policies use cases:
⚠️ Exam trap: “Allow S3 to publish to SNS” → SNS Access Policy (not IAM policy)
| Feature | SQS | SNS | Kinesis |
|---|---|---|---|
| Model | Pull (consumers poll) | Push (to subscribers) | Pull (standard) / Push (enhanced fan-out) |
| Data persistence | Deleted after consumed | Not persisted (lost if not delivered) | Retained up to 365 days |
| Replay capability | No | No | Yes |
| Consumers/Subscribers | Unlimited workers | 12.5M subscribers, 100K topics | 2 MB/shard (standard), 2 MB/shard/consumer (enhanced) |
| Throughput | No provisioning needed | No provisioning needed | Provisioned or On-demand |
| Ordering | FIFO queues only | FIFO topics (for SQS FIFO) | Per shard (Partition ID) |
| Delay | Individual message delay | No | No |
| Use case | Decouple apps, buffer | Fan-out notifications | Real-time big data, analytics, ETL |
Collect and store streaming data in real-time
Kinesis Data Streams Flow:
┌─────────────────┐ ┌──────────────────┐
│ Click Streams │ │ Application │
│ IoT Devices │──┐ ┌──────────────────────┐ ┌──▶│ Lambda │
│ Metrics & Logs │ │ │ Kinesis Data Streams │ │ │ Data Firehose │
└─────────────────┘ │ │ ┌────┬────┬────┐ │ │ │ Apache Flink │
├───▶│ │Shard│Shard│Shard│ │───┘ └──────────────────┘
┌──────────────────┐ │ │ └────┴────┴────┘ │ Consumers
│ Producers: │ │ └──────────────────────┘
│ - Applications │─┘
│ - Kinesis Agent │
└──────────────────┘Key Features:
Libraries:
| Mode | Provisioning | Throughput | Scaling | Pricing |
|---|---|---|---|---|
| Provisioned | Choose # of shards | 1 MB/s in, 2 MB/s out per shard | Manual | Per shard/hour |
| On-Demand | Automatic | Default 4 MB/s in | Auto (based on last 30 days peak) | Per stream/hour + data in/out |
Switching modes: Console or CLI, no downtime, but limited to 2 switches per 24 hours
ProvisionedThroughputExceeded: Add more shards or switch to On-Demand mode
⚠️ Exam trap: “Unpredictable traffic spikes in Kinesis” → On-demand mode
⚠️ Exam trap: “ProvisionedThroughputExceeded in Kinesis” → Add shards or use On-Demand mode
⚠️ Exam trap: Why NOT “SQS as buffer to Kinesis”? Seems logical (SQS handles any spike, buffers for Kinesis). But adds latency (no longer real-time), complexity, and the bottleneck just MOVES to where SQS writes to Kinesis Solution: Scale Kinesis directly (add shards) — don’t work around it
⚠️ Exam trap: “Need to replay streaming data” → Kinesis Data Streams (not Firehose, not SQS)
Load streaming data into destinations (fully managed, no code)
Data Firehose Flow:
┌─────────────────┐ ┌─────────────────────┐
│ Producers: │ │ AWS Destinations: │
│ - Kinesis Streams│ ┌─────────────────────┐ │ - S3 │
│ - CloudWatch │ │ │ │ - Redshift │
│ - AWS IoT │────▶│ Data Firehose │───────▶│ - OpenSearch │
│ - SNS │ │ (batch writes) │ ├─────────────────────┤
│ - SDK/Agent │ │ │ │ │ 3rd Party: │
└─────────────────┘ │ ▼ │ │ - Splunk, Datadog │
│ Lambda Transform │ │ - MongoDB, NewRelic│
Record up to 1MB └─────────────────────┘ ├─────────────────────┤
│ │ Custom: HTTP endpoint│
▼ └─────────────────────┘
S3 Backup Bucket
(all or failed data)Note: SQS is NOT a Firehose producer (SQS → Firehose requires Lambda in between)
Key Features:
| Feature | Kinesis Data Streams | Data Firehose |
|---|---|---|
| Purpose | Streaming data collection | Load data to destinations |
| Management | Producer/Consumer code needed | Fully managed |
| Latency | Real-time (~200ms) | Near real-time (buffering) |
| Scaling | Provisioned / On-Demand | Automatic |
| Data Storage | Up to 365 days | No storage |
| Replay | ✅ Yes | ❌ No |
| Destinations | Custom consumers | S3, Redshift, OpenSearch, 3rd party, HTTP |
| Data Transformation | ❌ No (raw data) | ✅ Yes (Lambda, format conversion) |
⚠️ Exam trap: “Real-time streaming” → Kinesis Data Streams. “Near real-time” → Data Firehose
⚠️ Exam trap: “Transform data while streaming to S3” → Data Firehose (only service with built-in transformation)
⚠️ Exam trap: “Load streaming data directly to S3” → Data Firehose (not Kinesis Data Streams)
Managed message broker for RabbitMQ and ActiveMQ (migration path for on-prem apps)
When to use Amazon MQ vs SQS/SNS:
| Feature | SQS/SNS | Amazon MQ |
|---|---|---|
| Protocols | AWS proprietary | MQTT, AMQP, STOMP, OpenWire, WSS |
| Scaling | Serverless, unlimited | Runs on servers, limited scaling |
| Use case | New cloud-native apps | Migrate existing on-prem apps |
| Features | Queue (SQS) OR Topic (SNS) | Both queue AND topic features |
Amazon MQ Supported Protocols:
⚠️ Exam trap: “Migrate on-prem app using MQTT/AMQP/STOMP” → Amazon MQ (SQS/SNS don’t support these protocols)
Amazon MQ High Availability (Multi-AZ):
Region (us-east-1)
┌─────────────────────────────────┐
│ AZ (us-east-1a) │
│ ┌─────────────────┐ │
┌───────▶│ │ ACTIVE Broker │◀────┐ │
│ │ └─────────────────┘ │ │
│ ├────────────────────────────┼────┤
┌────────┐ │ │ AZ (us-east-1b) │ │
│ Client │─┤ │ ┌─────────────────┐ │ │ ┌─────────┐
└────────┘ │ │ │ STANDBY Broker │◀────┼────┼───▶│ Amazon │
│ │ └─────────────────┘ │ │ │ EFS │
└───────▶│ (failover) │ │ │(storage)│
└─────────────────────────────────┘ └─────────┘High Availability:
⚠️ Exam trap: “Migrate on-prem RabbitMQ/ActiveMQ to AWS” → Amazon MQ (not SQS/SNS)
⚠️ Exam trap: “SNS FIFO topic subscribers” → SQS queues only (Standard or FIFO)
The entire point of messaging services is breaking tight coupling. When you see:
SQS is the universal buffer. Infinite throughput, never loses messages. When anything is overwhelmed (database, API, service), put SQS in front.
Two fundamental patterns:
Fan Out = SNS + SQS combined. SNS pushes to multiple SQS queues. Each queue processes independently. Best of both worlds.
| Service | Stores Data? | Replay? | Retention |
|---|---|---|---|
| SQS | Until consumed | ❌ No | 4 days default, 14 max |
| SNS | Never | ❌ No | None (deliver or lose) |
| Kinesis Streams | Up to 365 days | ✅ Yes | 1 day default, 365 max |
| Firehose | Never | ❌ No | None (pass-through) |
If they need to reprocess old data → Kinesis Data Streams is the ONLY option.
Firehose is “lazy Kinesis” — easier but slower. If latency matters, use Streams.
Standard queues/topics are fast but messy (duplicates possible, order not guaranteed).
FIFO queues/topics trade throughput (300 msg/s) for guarantees:
Cross-account? Other AWS service (S3, SNS)? → Resource-based policy.
SQS/SNS = AWS proprietary SDKs. Great for new apps. Amazon MQ = Open protocols (MQTT, AMQP, STOMP). For migrating existing apps.
Keyword “migrate” + “existing broker” + “no code changes” = Amazon MQ
| Service | How to Scale |
|---|---|
| SQS | Automatic (just add consumers) |
| SNS | Automatic |
| Kinesis | Add shards (Provisioned) or use On-Demand |
| Firehose | Automatic |
ProvisionedThroughputExceeded = add shards or switch to On-Demand.
What's the pattern?
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
One-to-One One-to-Many Continuous Stream
│ │ │
▼ ▼ ▼
SQS SNS Need replay?
(or Fan Out) │
┌───────┴───────┐
▼ ▼
Yes No
│ │
▼ ▼
Kinesis DS Firehose| If question mentions… | Answer is… |
|---|---|
| “ordering” or “sequence” | SQS/SNS FIFO |
| “exactly-once” or “no duplicates” | FIFO + Deduplication ID |
| “replay” or “reprocess” | Kinesis Data Streams |
| “transform while streaming” | Data Firehose + Lambda |
| “load directly to S3/Redshift” | Data Firehose |
| “multiple destinations from one event” | SNS Fan Out |
| “filter messages per subscriber” | SNS Filter Policy |
| “MQTT/AMQP/STOMP protocol” | Amazon MQ |
| “cross-account access” | Resource-based policy (SQS/SNS Access Policy) |
| “reduce costs” + SQS | Long Polling |
| “database overwhelmed” | SQS as buffer |
| “unpredictable traffic” + Kinesis | On-Demand mode |
| “ProvisionedThroughputExceeded” | Add shards or On-Demand |
| “consumer needs more time” | ChangeMessageVisibility API |
| “scale based on queue depth” | CloudWatch Alarm on ApproximateNumberOfMessages |
| Statement | Why It’s Wrong |
|---|---|
| SNS for replay | SNS does NOT persist messages |
| SQS pushes to consumers | SQS is pull-only (consumers poll) |
| Kinesis Streams transforms data | Streams = raw data only, Firehose transforms |
| Firehose for replay | Firehose does NOT store (pass-through) |
| SQS/SNS with MQTT | Use Amazon MQ for MQTT/AMQP/STOMP |
| Lambda subscribes to SNS FIFO | SNS FIFO → SQS queues ONLY |
| SQS as Firehose producer | SQS needs Lambda to feed Firehose |
| Kinesis Streams loads to S3 directly | Use Firehose for S3/Redshift loading |
Keywords: overwhelmed, spikes, lost transactions, protect database, decouple
Answer: SQS as buffer
requests ──▶ [Front-end ASG] ──▶ [SQS Queue] ──▶ [Back-end ASG] ──▶ Database
(absorbs spike) (processes safely)Keywords: notify multiple services, fan out, broadcast, S3 event to multiple queues
Answer: SNS (or SNS + SQS Fan Out)
[Producer] ──▶ [SNS Topic] ──┬──▶ [SQS Queue 1] ──▶ Service A
├──▶ [SQS Queue 2] ──▶ Service B
└──▶ [Lambda] ──▶ Service CKeywords: replay, reprocess, audit trail, re-analyze, multiple consumers read same data
Answer: Kinesis Data Streams
Why: Only service that stores data (up to 365 days) and allows multiple reads.
Keywords: load streaming data, store in S3, analytics destination, transform while streaming
Answer: Data Firehose
[Any Source] ──▶ [Firehose] ──▶ (optional Lambda) ──▶ S3/Redshift/OpenSearchKeywords: real-time, sub-second, immediate, ~200ms, IoT, clickstream
Answer: Kinesis Data Streams (NOT Firehose — it buffers)
Keywords: ordering, sequence, exactly-once, financial transactions, no duplicates
Answer: FIFO queue/topic
Remember: Queue name ends with .fifo. Throughput = 300-3000 msg/s.
Keywords: migrate, existing application, RabbitMQ, ActiveMQ, MQTT, AMQP, no code changes
Answer: Amazon MQ
Why: Supports open protocols. SQS/SNS require AWS SDK = code changes.
Keywords: reduce API calls, empty responses, cost optimization, SQS
Answer: Long Polling (set WaitTimeSeconds up to 20 sec)
Keywords: timeout, visibility, need more time, duplicate processing
Answer: Increase Visibility Timeout or call ChangeMessageVisibility API
Keywords: unpredictable, variable load, spikes, promotional campaign
Answer: On-Demand mode (auto-scales based on last 30 days peak)
Keywords: throughput exceeded, throttling, Kinesis errors
Answer: Add more shards OR switch to On-Demand
Keywords: cross-account, allow S3 to write, allow SNS to write
Answer: Resource-based policy (SQS/SNS Access Policy)
Keywords: filter, route by attribute, different processing per type
Answer: SNS Filter Policy (JSON policy per subscription)
Keywords: S3 notification, multiple queues, multiple Lambda
Answer: S3 → SNS → Fan Out (S3 allows only one rule per event+prefix combo)
| Feature | SQS | SNS | Kinesis Streams | Firehose | Amazon MQ |
|---|---|---|---|---|---|
| Model | Pull | Push | Pull/Push | Push | Pull/Push |
| Throughput | Unlimited | Unlimited | Per shard | Auto | Limited |
| Ordering | FIFO only | FIFO only | Per shard | No | Yes |
| Persistence | Until consumed | No | Up to 365 days | No | Yes |
| Replay | ❌ | ❌ | ✅ | ❌ | ❌ |
| Transform | ❌ | ❌ | ❌ | ✅ | ❌ |
| Protocols | AWS SDK | AWS SDK | AWS SDK | AWS SDK | MQTT/AMQP/STOMP |
| Service | Throughput |
|---|---|
| SQS Standard | Unlimited |
| SQS FIFO | 300 msg/s (3000 batched) |
| SNS Standard | Unlimited (12.5M subscribers/topic) |
| SNS FIFO | 300 msg/s (3000 batched) |
| Kinesis Provisioned | 1 MB/s in, 2 MB/s out per shard |
| Kinesis On-Demand | Auto (default 4 MB/s in) |
| Data Firehose | Auto-scales |
| Service | Max Message/Record Size |
|---|---|
| SQS | 256 KB |
| SNS | 256 KB |
| Kinesis | 1 MB |
| Firehose | 1 MB |
| API | Service | Purpose |
|---|---|---|
SendMessage | SQS | Send message to queue |
ReceiveMessage | SQS | Poll messages (up to 10) |
DeleteMessage | SQS | Remove processed message |
ChangeMessageVisibility | SQS | Extend processing time |
Publish | SNS | Send to topic |
PutRecord / PutRecords | Kinesis | Send to stream |
| Question Contains | → Instant Answer |
|---|---|
| “replay” / “reprocess” | Kinesis Data Streams |
| “fan out” / “multiple destinations” | SNS (+ SQS for persistence) |
| “buffer” / “overwhelmed” / “protect DB” | SQS |
| “MQTT” / “AMQP” / “migrate broker” | Amazon MQ |
| “ordering” / “sequence” / “exactly-once” | FIFO |
| “transform while streaming” | Data Firehose |
| “load to S3” (from stream) | Data Firehose |
| “real-time” + streaming | Kinesis Data Streams |
| “near real-time” + streaming | Data Firehose |
| “reduce SQS costs” | Long Polling |
| “empty responses” | Long Polling |
| “unpredictable Kinesis traffic” | On-Demand mode |
| “ProvisionedThroughputExceeded” | Add shards / On-Demand |
| “cross-account” / “allow S3/SNS” | Resource-based policy |
| “filter per subscriber” | SNS Filter Policy |
| “S3 event multiple destinations” | S3 → SNS → Fan Out |
| “consumer needs more time” | ChangeMessageVisibility |
| “scale on queue depth” | CloudWatch ApproximateNumberOfMessages |
| “no code changes” + “migrate” | Amazon MQ |
| “RabbitMQ/ActiveMQ to AWS” | Amazon MQ |
When stuck between options, eliminate systematically:
□ Do they need REPLAY?
→ No = eliminate Kinesis Data Streams
→ Yes = Kinesis Data Streams is likely answer
□ Do they need PUSH to multiple?
→ No = eliminate SNS
→ Yes = SNS or Fan Out pattern
□ Do they need ORDERING?
→ No = eliminate FIFO options
→ Yes = must be FIFO
□ Do they need REAL-TIME?
→ No = Firehose acceptable
→ Yes = must be Kinesis Data Streams
□ Do they mention OPEN PROTOCOLS (MQTT/AMQP)?
→ No = eliminate Amazon MQ
→ Yes = Amazon MQ is likely answer
□ Do they need DATA TRANSFORMATION?
→ No = Kinesis Streams acceptable
→ Yes = must be Firehose (with Lambda)
□ Is it CROSS-ACCOUNT or OTHER SERVICE access?
→ No = IAM policy
→ Yes = Resource-based policyStateless Web App Evolution: WhatIsTheTime.com A simple app that returns current time — no database needed.
Growth Steps:
| Step | Architecture | Problem Solved | New Problem |
|---|---|---|---|
| 1 | EC2 + Public IP | Works! | IP changes on restart |
| 2 | EC2 + Elastic IP | Static IP | Single point of failure, no scaling |
| 3 | EC2 + Route 53 (A record) | DNS-based, no Elastic IP needed | Still single instance |
| 4 | ELB + multiple EC2 | Horizontal scaling, health checks | Manual instance management |
| 5 | ELB + ASG | Auto-scaling, self-healing | Single AZ failure risk |
| 6 | ELB + ASG + Multi-AZ | High availability across AZs | ✅ Production ready! |
┌──────────┐
│ Route 53 │ Alias Record
│ DNS │ api.whatisthetime.com
└────┬─────┘
│
▼
┌─────────┐ AZ 1-3
│ ELB │◄─── Health Checks
│Multi-AZ │ + Multi-AZ
└────┬────┘
│
┌───────┼───────┐
▼ ▼ ▼
┌───┐ ┌───┐ ┌───┐
│M5 │ │M5 │ │M5 │ ◄── Auto Scaling Group
│AZ1│ │AZ2│ │AZ3│ (spans 3 AZs)
└───┘ └───┘ └───┘Key Concepts Covered:
⚠️ Exam trap - Cost Optimization with ASG:
Stateful Web App Evolution: MyClothes.com (Session State) E-commerce app with shopping cart — needs to maintain user state across requests.
The Problem: With multiple EC2 instances behind ELB, user may hit different server each request → loses shopping cart!
Growth Steps:
| Step | Solution | How It Works | Trade-off |
|---|---|---|---|
| 1 | ELB Sticky Sessions | Cookie ties user to same EC2 | Instance failure = lost cart |
| 2 | User Cookies | Store cart in browser cookie | Limited size, security risk |
| 3 | ElastiCache (Sessions) | Store session in Redis/Memcached | Sub-ms latency, shared state |
| 4 | DynamoDB (Sessions) | Alternative to ElastiCache | Serverless, auto-scaling |
| 5 | RDS (User Data) | Persist user details, addresses | Need read replicas for scale |
| 6 | ElastiCache (Caching) | Cache RDS queries | Reduce DB load |
| 7 | Multi-AZ Everything | RDS + ElastiCache Multi-AZ | ✅ Production ready! |
┌──────────┐
│ Route 53 │
└────┬─────┘
│
┌────────────────────────┴──────────────────────┐
│ Multi-AZ │
│ ┌─────────┐ │
│ │ ELB │◄── Open HTTP/HTTPS to 0.0.0.0/0 │
│ └────┬────┘ │
│ │ Restrict to ELB SG only │
│ ┌────┴────┬─────────┐ Auto Scaling Group │
│ ▼ ▼ ▼ │
│┌────┐ ┌────┐ ┌────┐ │
││ M5 │ │ M5 │ │ M5 │ AZ1, AZ2, AZ3 │
│└──┬─┘ └─┬──┘ └──┬─┘ │
└──┼─────────┼──────────┼───────────────────────┘
│ │ │
│ Restrict to EC2 SG only
▼ ▼ ▼
┌─────────┐ ┌─────────┐
│Elasti- │ │ RDS │
│Cache │ │Multi-AZ │
│(sessions│ │+Replicas│
│+caching)│ └─────────┘
└─────────┘3-Tier Security (SG Chaining):
| Layer | Security Group Rule |
|---|---|
| ELB | Inbound: HTTP/HTTPS from 0.0.0.0/0 |
| EC2 | Inbound: Only from ELB SG |
| RDS/ElastiCache | Inbound: Only from EC2 SG |
Key Concepts:
⚠️ Exam trap - Stateless Session Storage:
| Storage | Stateless? | Why |
|---|---|---|
| ElastiCache | ✅ Yes | Shared across all EC2s |
| RDS/DynamoDB | ✅ Yes | Shared across all EC2s |
| HTTP Cookies | ✅ Yes | Client carries state |
| EBS | ❌ No | Single AZ, single EC2 only |
EBS makes app stateful — user hitting different EC2 loses session!
Typical 3-Tier Web App Architecture Reference diagram showing production-ready AWS web app with all components:
┌──────────┐
│ Route 53 │
└────┬─────┘
│
┌──────────────────────────────────┴──────────────────────────────────────┐
│ PUBLIC SUBNET │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ ELB (Multi-AZ) │ │
│ │ ◄─ Open HTTP/HTTPS to 0.0.0.0/0 │ │
│ └─────────────────────────┬───────────────────────────────────────┘ │
└──────────────────────────────┼──────────────────────────────────────────┘
│
┌──────────────────────────────┴──────────────────────────────────────────┐
│ PRIVATE SUBNET Auto Scaling Group │
│ ┌─────────────┬─────────────┬─────────────┐ │
│ │ ┌─────┐ │ ┌─────┐ │ ┌─────┐ │ │
│ │ │ M5 │ │ │ M5 │ │ │ M5 │ │ │
│ │ │ AZ1 │ │ │ AZ2 │ │ │ AZ3 │ │ │
│ │ └──┬──┘ │ └──┬──┘ │ └──┬──┘ │ │
│ └─────────────┴─────────────┴─────────────┘ │
└─────────────────┼───────────┼───────────┼───────────────────────────────┘
│ │ │
┌─────────────────┴───────────┴───────────┴───────────────────────────────┐
│ DATA SUBNET │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ ElastiCache │ │ Amazon RDS │ │
│ │ ───────────── │ │ ───────────── │ │
│ │ Session storage │ │ Read/write data │ │
│ │ + Query cache │ │ (Multi-AZ) │ │
│ └─────────────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘| Subnet | Contains | Access |
|---|---|---|
| Public | ELB | Open to internet (0.0.0.0/0) |
| Private | EC2 (ASG) | Only from ELB SG |
| Data | RDS, ElastiCache | Only from EC2 SG |
Stateful Web App Evolution: MyWordPress.com (Shared File Storage) Scalable WordPress with image uploads and MySQL database.
The Problem: Images uploaded to one EC2 won’t be visible from other EC2 instances!
Growth Steps:
| Step | Solution | Problem Solved | Limitation |
|---|---|---|---|
| 1 | Single EC2 + EBS | Simple, works | Single AZ, no scaling |
| 2 | Multi EC2 + EBS each | Scaling | Images not shared across instances! |
| 3 | Multi EC2 + EFS | Shared storage across AZs | ✅ All instances see all images |
| 4 | Aurora MySQL | Multi-AZ + Read Replicas built-in | ✅ Production ready! |
EBS vs EFS for Distributed Apps:
| Storage | Scope | Use Case |
|---|---|---|
| EBS | Single EC2 in single AZ | Single instance apps |
| EFS | Shared across EC2s + AZs | Distributed apps (WordPress, CMS) |
┌──────────┐
│ Route 53 │
└────┬─────┘
│
┌────┴────┐
│ ELB │ Multi-AZ
└────┬────┘
│
┌───────┴───────┐
▼ ▼
┌────────┐ ┌────────┐
│ M5 │ │ M5 │
│ AZ 1 │ │ AZ 2 │
└───┬────┘ └───┬────┘
│ ENI │ ENI
│ │
└──────┬───────┘
│
▼
┌───────┐
│ EFS │ ◄── Shared storage
│ │ (images visible
└───────┘ from all EC2s)Key Concepts:
⚠️ Exam trap: “Shared file storage across multiple EC2 instances” → EFS (not EBS!)
⚠️ Exam trap - Software updates on 100s of EC2s:
Instantiating Applications Quickly Launching a full stack (EC2, EBS, RDS) can be slow — install apps, configure, insert data. Use these strategies to speed up:
Golden AMI = AMI standardized through configuration, consistent security patching, and hardening. Contains pre-approved agents for logging, security, and performance monitoring. In Beanstalk, you can specify a custom AMI instead of the standard platform AMI to improve provisioning times.
| Resource | Fast Launch Strategy | What It Does |
|---|---|---|
| EC2 | Golden AMI | Pre-baked image with OS, apps, dependencies |
| EC2 | User Data | Bootstrap script for dynamic config at launch |
| EC2 | Hybrid | Golden AMI + User Data (Elastic Beanstalk approach) |
| RDS | Restore from Snapshot | DB with schemas + data ready instantly |
| EBS | Restore from Snapshot | Pre-formatted disk with data |
Golden AMI vs User Data:
| Approach | Speed | Flexibility | Use Case |
|---|---|---|---|
| Golden AMI | ⚡ Fastest | Low (requires rebuild) | Stable configs, rarely change |
| User Data | Slower | High (scripts) | Dynamic config, secrets |
| Hybrid | Balanced | Medium | Best of both worlds |
⚠️ Exam trap - “Speed up EC2 launch / scale-out”:
⚠️ Exam trap — EC2 User Data facts:
sudo not needed)[scripts-user, always] in cloud-init (non-default)⚠️ Exam trap - “Static + dynamic installation, reduce boot time”:
Developer-centric view of deploying apps on AWS — just upload code, Beanstalk handles the rest.
| Feature | Details |
|---|---|
| What it manages | EC2, ASG, ELB, RDS, CloudWatch, etc. |
| Your responsibility | Application code only |
| Control | Full control over configuration if needed |
| Cost | Free (pay only for underlying resources) |
Workflow:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Create │───→│ Upload │───→│ Launch │───→│ Manage │
│ Application │ │ Version │ │ Environment │ │ Environment │
└─────────────┘ └──────┬──────┘ └─────────────┘ └──────┬──────┘
│ │
│◄────── deploy new version ──────────┘
│
└──────── update version ─────────────→Components:
| Component | Description |
|---|---|
| Application | Container for environments, versions, configs |
| Application Version | Iteration of your code (stored in S3) |
| Environment | AWS resources running ONE version at a time |
| Environment Tier | Web Server or Worker |
Environment Tiers:
| Tier | Use Case | Components |
|---|---|---|
| Web Server | HTTP requests | ELB + ASG + EC2 |
| Worker | Background tasks | SQS + ASG + EC2 |
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ Web Server Tier │ │ Worker Tier │
│ (myapp.us-east-1.elasticbeanstalk) │ │ │
├─────────────────────────────────────┤ ├─────────────────────────────────────┤
│ ┌─────┐ │ │ ┌───────────┐ │
│ │ ELB │ │ │ │ SQS Queue │ │
│ └──┬──┘ │ │ └─────┬─────┘ │
│ │ │ │ pull messages │
│ ┌───────────┴───────────┐ │ │ ┌───────────┴───────────┐ │
│ ▼ ▼ │ │ ▼ ▼ │
│ ┌────────┐ ASG ┌────────┐ │ │ ┌────────┐ ASG ┌────────┐ │
│ │ EC2 │ │ EC2 │ │ │ │ EC2 │ │ EC2 │ │
│ │(WebSrv)│ │(WebSrv)│ │ │ │(Worker)│ │(Worker)│ │
│ └────────┘ └────────┘ │ │ └────────┘ └────────┘ │
│ AZ 1 AZ 2 │ │ AZ 1 AZ 2 │
└─────────────────────────────────────┘ └─────────────────────────────────────┘Worker Tier Details:
Deployment Modes:
| Mode | Components | Use Case |
|---|---|---|
| Single Instance | Elastic IP + EC2 + RDS | Dev/test |
| High Availability | ALB + ASG + Multi-AZ RDS | Production |
┌─────────────────────────┐ ┌─────────────────────────────────────────────┐
│ Single Instance │ │ High Availability with Load Balancer │
│ (Great for dev) │ │ (Great for prod) │
├─────────────────────────┤ ├─────────────────────────────────────────────┤
│ Elastic IP │ │ ┌─────┐ │
│ │ │ │ │ ALB │ │
│ ▼ │ │ └──┬──┘ │
│ ┌────────┐ │ │ ┌───────────┴───────────┐ │
│ │ EC2 │ │ │ ▼ ▼ │
│ └────────┘ │ │ ┌────────┐ ASG ┌────────┐ │
│ │ │ │ │ EC2 │ │ EC2 │ │
│ ▼ │ │ └────────┘ └────────┘ │
│ ┌────────┐ │ │ AZ 1 AZ 2 │
│ │ RDS │ │ │ │ │ │
│ │ Master │ │ │ ▼ ▼ │
│ └────────┘ │ │ ┌────────┐ ┌────────┐ │
│ AZ 1 │ │ │ RDS │ │ RDS │ │
│ │ │ │ Master │ │Standby │ │
└─────────────────────────┘ │ └────────┘ └────────┘ │
│ AZ 1 AZ 2 │
└─────────────────────────────────────────────┘Supported Platforms: Go, Java SE, Java/Tomcat, .NET Core/Linux, .NET/Windows, Node.js, PHP, Python, Ruby, Docker (Single/Multi-container), Packer Builder
⚠️ Exam trap: Beanstalk is free — you pay for EC2, RDS, ELB, etc. that it provisions!
⚠️ Exam trap - Slow Beanstalk deployments:
Deployment Strategies:
| Strategy | Downtime | Deploy Time | Rollback | Use Case |
|---|---|---|---|---|
| All-at-once | Yes ⚠️ | ⚡ Fastest | Redeploy | Dev/test |
| Rolling | No | Slow | Redeploy | Prod, cost-conscious |
| Rolling with batch | No | Slower | Redeploy | Prod, maintain capacity |
| Immutable | No | Slowest | Terminate new ASG | Prod, safest |
| Blue/Green | No | Fast | Swap URL | Prod, instant rollback |
Deployment Details:
| Strategy | How It Works |
|---|---|
| All-at-once | Deploy to all at same time — brief outage |
| Rolling | Deploy to batches, old instances serve while updating |
| Rolling with batch | Like rolling, but spins up NEW instances first (maintains capacity) |
| Immutable | New ASG with new instances → swap → terminate old ASG |
| Blue/Green | New environment → Route 53/ELB swap → terminate old env |
⚠️ Exam trap - Deployment strategies:
.ebextensions:
.ebextensions/*.config (YAML/JSON)Saved Configurations:
Amazon CloudWatch is a service that monitors applications, responds to performance changes, optimizes resource use, and provides insights into operational health. By collecting data across AWS resources, CloudWatch gives visibility into system-wide performance and allows users to set alarms, automatically react to changes, and gain a unified view of operational health.
Important metrics:
Amazon CloudWatch Alarms are used to trigger notifications for any metric.
Amazon CloudWatch Logs:
Amazon EventBridge is a serverless event bus that ingests data from your own apps, SaaS apps, and AWS services and routes that data to targets.
Amazon CloudTrail is an AWS service that helps you enable operational and risk auditing, governance, and compliance of your AWS account. Actions taken by a user, role, or an AWS service are recorded as events in CloudTrail. Get an history of events / API calls made in the AWS Management Console, AWS Command Line Interface, and AWS SDKs and APIs. Can put logs from CloudTrail into CloudWatch Logs or S3. Audit of all users’ events and activities.
AWS X-Ray provides a complete view of requests as they travel through your application and filters visual data across payloads, functions, traces, services, APIs, and more with no-code and low-code motions.
Amazon CodeGuru is a static application security testing (SAST) tool that combines machine learning (ML) and automated reasoning to identify vulnerabilities in your code, provide recommendations on how to fix the identified vulnerabilities, and track the status of the vulnerabilities until closure.
AWS Health Dashboard view the overall status and health of AWS services. AWS Health Dashboard - Your Account provides alerts and remediation guidance when AWS is experiencing events that may impact you.
Log Structure:
Log Sources:
Log Destinations:
Encryption:
| Method | Latency | Use Case |
|---|---|---|
| S3 Export | Up to 12 hours | Batch archive, compliance |
| Subscriptions | Real-time / Near real-time | Live processing, streaming |
S3 Export:
CreateExportTaskGet real-time log events from CloudWatch Logs for processing and analysis.
Subscription Filter = filter which logs are delivered to destination
CloudWatch Logs ──► Subscription Filter ──┬──► Lambda (real-time) ──► OpenSearch
│
├──► Kinesis Firehose (near real-time) ──► S3
│
└──► Kinesis Data Streams ──► KDF/KDA/EC2/LambdaSubscription Destinations:
| Destination | Latency | Use Case |
|---|---|---|
| Lambda | Real-time | Transform, send to OpenSearch |
| Kinesis Firehose | Near real-time | Deliver to S3, Redshift, OpenSearch |
| Kinesis Data Streams | Real-time | Custom consumers, analytics |
Aggregate logs from multiple accounts and regions into a central location:
Account A / Region 1 ──► Subscription Filter ──┐
│
Account B / Region 2 ──► Subscription Filter ──┼──► Kinesis Data Streams ──► Firehose ──► S3
│ (near real-time)
Account B / Region 3 ──► Subscription Filter ──┘⚠️ Exam trap: “Aggregate logs from multiple accounts/regions” → Subscription Filters to central Kinesis Data Streams, then Firehose to S3.
Continually stream CloudWatch metrics to a destination with near-real-time delivery.
CloudWatch Metrics ──► Kinesis Data Firehose ──┬──► S3 ──► Athena
(near real-time) │
├──► Redshift
│
└──► OpenSearchDestinations:
Features:
⚠️ Exam trap: “Stream metrics to S3/Redshift/3rd party” → CloudWatch Metric Streams via Firehose.
By default, NO logs from EC2 go to CloudWatch!
Agent Types:
| Agent | Metrics | Logs | Notes |
|---|---|---|---|
| CloudWatch Logs Agent | ❌ | ✅ | Old version, logs only |
| CloudWatch Unified Agent | ✅ | ✅ | Recommended, more metrics |
CloudWatch Unified Agent:
Unified Agent Metrics (Linux/EC2):
| Category | Metrics |
|---|---|
| CPU | active, guest, idle, system, user, steal |
| Disk | free, used, total |
| Disk IO | writes, reads, bytes, iops |
| RAM | free, inactive, used, total, cached |
| Netstat | TCP/UDP connections, net packets, bytes |
| Processes | total, dead, blocked, idle, running, sleep |
| Swap | free, used, used % |
⚠️ Exam trap: “Monitor RAM on EC2” or “EC2 memory usage” → CloudWatch Unified Agent required! RAM is NOT a default EC2 metric.
Alarm States:
| State | Meaning |
|---|---|
| OK | Metric within threshold |
| ALARM | Metric breached threshold |
| INSUFFICIENT_DATA | Not enough data yet |
Period:
Alarm Targets:
| Target | Action |
|---|---|
| EC2 | Stop, Terminate, Reboot, or Recover |
| Auto Scaling | Trigger scaling action (scale out/in) |
| SNS | Send notification (then trigger Lambda, etc.) |
Composite Alarms:
Create alarms based on CloudWatch Logs using Metric Filters:
CW Logs ──► Metric Filter ──► CW Metric ──► CW Alarm ──► SNS (alert)
(pattern match) (count) (threshold)How it works:
Example: RDS Error Alerting
RDS DB Logs ──► CloudWatch Logs ──► Metric Filter ──► Metric ──► Alarm ──► SNS
("Error") (count) (>0)⚠️ Exam trap: “Alert on keyword in logs” (Error, Exception, etc.) → Metric Filter + Alarm.
⚠️ Exam trap: Don’t use Lambda polling (expensive, not real-time). Don’t use Config (monitors resource config, not log content).
Status Checks:
| Check | What it monitors |
|---|---|
| Instance status | EC2 VM (software) |
| System status | Underlying hardware |
| Attached EBS status | EBS volumes |
Recovery with CloudWatch Alarm:
EC2 Instance ◄── monitor ── CloudWatch Alarm ──► alert ──► SNS Topic
│ (StatusCheckFailed_System)
│
└── EC2 Instance RecoveryWhat’s preserved after recovery:
⚠️ Exam trap: “Auto-recover EC2 on hardware failure” → CloudWatch Alarm on StatusCheckFailed_System → EC2 Recovery action.
⚠️ Exam trap: “Most cost-optimal way to auto-reboot/stop/recover EC2” → CloudWatch Alarm → EC2 Action (direct). NOT CW Alarm → SNS → Lambda → EC2 API (over-engineered, 3 services). NOT EventBridge → Lambda (unnecessary compute). CW Alarms have built-in EC2 actions (Stop, Terminate, Reboot, Recover) — no Lambda needed.
Test alarms manually using CLI:
aws cloudwatch set-alarm-state \
--alarm-name "myalarm" \
--state-value ALARM \
--state-reason "testing purposes"Monitor network issues between AWS and on-premises data center.
AWS Cloud
┌────────────────────────────┐
│ ┌──────────────────────┐ │
│ │ Private Subnet │ │
│ │ ┌────────────┐ │ │
│ │ │ EC2 Instance│ │ │
│ │ └────────────┘ │ │
│ └──────────────────────┘ │
│ │
│ CloudWatch Metrics ◄──────┼──── DX Connection ──┬──► Corporate Data Center
│ │ or │ │
└────────────────────────────┘ VPN Connection │ Server
│Features:
⚠️ Exam trap: “Monitor network connectivity to on-premises” or “detect packet loss/latency over DX/VPN” → CloudWatch Network Synthetic Monitor.
| Insight Type | Target | Use Case |
|---|---|---|
| Container Insights | ECS, EKS, K8s on EC2, Fargate | Metrics + logs from containers |
| Lambda Insights | Lambda functions | Cold starts, memory, CPU, shutdowns |
| Contributor Insights | CloudWatch Logs | Find top-N talkers, bad hosts, heavy users |
| Application Insights | EC2 apps (Java, .NET, IIS) | Auto-dashboard for app troubleshooting |
⚠️ Exam trap: “Find top talkers” or “heaviest network users from logs” → Contributor Insights.
⚠️ Exam trap: “Monitor Lambda cold starts” or “Lambda memory/CPU” → Lambda Insights (Lambda Layer).
⚠️ Exam trap: “Auto-dashboard for .NET/Java app issues” → Application Insights (SageMaker-powered).
Distributed tracing for analyzing and debugging applications.
Client ──► API Gateway ──► Lambda ──► DynamoDB
│ │ │
└──────────────┴────────────┘
X-Ray collects traces
│
▼
┌─────────────┐
│ Service Map │ ◄── Visual representation
│ (latency, │ of request flow
│ errors) │
└─────────────┘Key Concepts:
| Concept | Description |
|---|---|
| Segments | Data about work done by a service |
| Subsegments | More granular timing (e.g., DB calls) |
| Trace | End-to-end path of a request |
| Annotations | Key-value pairs for filtering traces (indexed) |
| Metadata | Key-value pairs for additional data (NOT indexed) |
Sampling Rules:
X-Ray Daemon:
Integrations:
| Service | How to Enable |
|---|---|
| Lambda | Enable “Active Tracing” in config |
| API Gateway | Enable X-Ray in stage settings |
| ECS/EKS | Run X-Ray daemon as sidecar |
| Elastic Beanstalk | .ebextensions config |
| EC2 | Install and run X-Ray daemon |
| ELB | Automatically adds trace header |
X-Ray APIs:
| API | Purpose |
|---|---|
PutTraceSegments | Upload segment documents |
PutTelemetryRecords | Upload telemetry |
GetSamplingRules | Retrieve sampling rules |
GetSamplingTargets | Get sampling decisions |
GetServiceGraph | Get visual service map |
GetTraceSummaries | Get trace IDs and annotations |
BatchGetTraces | Get full traces by ID |
⚠️ Exam trap: “Debug microservices” or “trace request across services” → X-Ray.
⚠️ Exam trap: “Filter traces by custom attribute” → Use Annotations (indexed), NOT Metadata.
⚠️ Exam trap: X-Ray daemon listens on UDP 2000 — ensure Security Group allows it.
Configurable scripts that monitor endpoints and APIs.
CloudWatch Synthetics
│
▼
┌───────────────┐ ┌──────────────┐ ┌─────────────┐
│ Canary │────►│ Endpoint/ │────►│ CloudWatch │
│ (scheduled) │ │ API │ │ Metrics │
└───────────────┘ └──────────────┘ └─────────────┘
│ │
▼ ▼
S3 (screenshots, CW Alarms
HAR files) (alert on failure)Key Features:
Canary Blueprints:
| Blueprint | Use Case |
|---|---|
| Heartbeat Monitor | Load URL, store screenshot, check availability |
| API Canary | Test REST APIs (GET, POST, etc.) |
| Broken Link Checker | Check all links on a page |
| Visual Monitoring | Compare screenshots against baseline |
| Canary Recorder | Record actions in Chrome, generate script |
| GUI Workflow Builder | Test multi-step workflows (login, checkout) |
Schedule: Run once or on schedule (rate or cron expression)
⚠️ Exam trap: “Monitor website availability” or “test API endpoint regularly” → Synthetics Canaries.
⚠️ Exam trap: Canaries are NOT for load testing — they’re for monitoring.
Two components:
| Dashboard | Scope | Purpose |
|---|---|---|
| Service Health | All AWS | Global AWS service status |
| Your Account Health | Your account | Events affecting YOUR resources |
Your Account Health Dashboard:
EventBridge Integration:
AWS Health Event ──► EventBridge ──► Lambda/SNS/etc.
(your account) (rule) (automate response)Use cases:
Health Event Types:
| Type | Description |
|---|---|
| Scheduled Change | Planned maintenance |
| Account Notification | Account-specific issues |
| Issue | Ongoing service problem |
⚠️ Exam trap: “React to AWS service issues affecting my resources” → Health Dashboard + EventBridge.
⚠️ Exam trap: Service Health = public status. Your Account Health = personalized to your resources.
Feature flags and A/B testing for applications.
Application ──► Evidently ──► Feature Flag / Variation
│
▼
Metrics collected
│
▼
Analyze resultsKey Features:
| Feature | Description |
|---|---|
| Feature Flags | Safely launch features (enable/disable remotely) |
| A/B Testing | Compare variations to measure impact |
| Launches | Gradual rollout to percentage of users |
| Experiments | Compare metrics between variations |
Use Cases:
⚠️ Exam trap: “Gradual feature rollout” or “A/B testing” → CloudWatch Evidently.
⚠️ Exam trap: Evidently is for application features, NOT infrastructure testing.
Event buses:
| Bus Type | Description |
|---|---|
| Default | Receives events from AWS services |
| Custom | Your application events |
| Partner | SaaS integrations (Datadog, Zendesk, etc.) |
Event Flow:
Event Sources EventBridge Targets
┌─────────────┐ ┌───────────┐ ┌─────────┐
│ AWS Services│──────────►│ │ │ Lambda │
├─────────────┤ │ Event │ Rules ├─────────┤
│ Custom Apps │──────────►│ Bus │────(filter)────►│ SQS/SNS │
├─────────────┤ │ │ ├─────────┤
│ SaaS Partners│─────────►│ │ │ Step Fn │
└─────────────┘ └───────────┘ └─────────┘Schema Registry:
Resource-based Policies:
Account A ──► EventBridge (Account A) ──► Event Bus (Account B - central)
Account B ──► EventBridge (Account B) ──► Event Bus (Account B - central)
Account C ──► EventBridge (Account C) ──► Event Bus (Account B - central)
│
▼
Central processing⚠️ Exam trap: “Aggregate events from multiple accounts” → EventBridge Resource-based Policy for cross-account access.
⚠️ Exam trap: Schema Registry = auto-discover event structure, NOT define schemas manually.
Governance, compliance, and audit for your AWS Account.
Sources CloudTrail Destinations
┌─────────┐ ┌─────────────────┐
│ SDK │──┐ ┌──►│ CloudWatch Logs │
├─────────┤ │ │ └─────────────────┘
│ CLI │──┼──► CloudTrail ──► Inspect & Audit ──┤
├─────────┤ │ │ ┌─────────────────┐
│ Console │──┤ └──►│ S3 Bucket │
├─────────┤ │ └─────────────────┘
│IAM Users│──┘
│& Roles │
└─────────┘Key Points:
| Event Type | Default | What it logs |
|---|---|---|
| Management Events | ✅ Enabled | Operations on resources (IAM, EC2, CloudTrail config) |
| Data Events | ❌ Disabled | High-volume: S3 object-level, Lambda Invoke |
| Insights Events | ❌ Disabled | Unusual activity detection |
Management Events:
Data Events:
Detect unusual activity in your account:
Management Events ──► Continuous ──► CloudTrail ──► Insights ──┬──► CloudTrail Console
analysis Insights Events │
├──► S3 Bucket
│
└──► EventBridge (automation)How it works:
| Storage | Retention | Use Case |
|---|---|---|
| CloudTrail | 90 days | Quick lookup, recent activity |
| S3 Bucket | Long-term | Compliance, historical analysis |
Long-term analysis: Log to S3 → query with Athena
Event Types: CloudTrail S3 Bucket Athena
┌──────────────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Management Events│────►│ │ │ │ │ │
│ Data Events │────►│ 90 days │───log───►│Long-term│──SQL──►│ Analyze │
│ Insights Events │────►│retention│ │retention│ │ │
└──────────────────┘ └─────────┘ └─────────┘ └─────────┘⚠️ Exam trap: “Who deleted the resource?” or “API call history” → CloudTrail.
⚠️ Exam trap: “Detect unusual IAM activity” or “burst of API calls” → CloudTrail Insights.
⚠️ Exam trap: “Keep CloudTrail logs beyond 90 days” → Log to S3, query with Athena.
⚠️ Exam trap: “Data Events disabled by default” — S3 object-level and Lambda Invoke need explicit enabling.
Store events and replay them later — built-in feature, no custom code needed.
| Feature | Description |
|---|---|
| Archive | Store events from any event bus (indefinitely or set retention) |
| Replay | Re-send archived events to same or different event bus |
| Filter | Archive only matching events (use event patterns) |
| Use case | Replay production events in dev/test environment |
Production Event Bus Dev Event Bus
│ ▲
▼ │
Archive ──────► Stored Events ─────► Replay │
(filter) (S3, managed) (6 months later)Key use case: Store production events → replay in dev environment for testing (periodically or on-demand).
⚠️ Exam trap: “Store EventBridge events for later replay” → Archive and Replay (NOT Lambda + S3/DynamoDB — over-engineered).
⚠️ Exam trap: “Most efficient and cost-effective way to store and replay events” → built-in feature wins over custom Lambda solutions.
Pattern: React to any API call with alerts/automation.
User ──► API Call ──► AWS Service ──► CloudTrail ──► EventBridge ──► SNS/Lambda
(logs API) (event) (alert/automate)Examples:
| Trigger | Flow |
|---|---|
| User assumes IAM Role | IAM (AssumeRole) → CloudTrail → EventBridge → SNS |
| Security Group modified | EC2 (AuthorizeSecurityGroupIngress) → CloudTrail → EventBridge → SNS |
| DynamoDB table deleted | DynamoDB (DeleteTable) → CloudTrail → EventBridge → SNS |
Example 1: IAM Role Assumption Alert
User ──► AssumeRole ──► IAM ──► CloudTrail ──► EventBridge ──► SNS
(API Call log) (event) (alert)
Example 2: Security Group Change Alert
User ──► Edit SG Rules ──► EC2 ──► CloudTrail ──► EventBridge ──► SNS
(AuthorizeSecurityGroupIngress)
Example 3: DynamoDB Table Deletion Alert
User ──► DeleteTable ──► DynamoDB ──► CloudTrail ──► EventBridge ──► SNSKey insight: CloudTrail logs all API calls → EventBridge can react to any of them!
⚠️ Exam trap: “Alert when user assumes role” or “notify on Security Group changes” → CloudTrail + EventBridge + SNS.
Auditing and recording compliance of AWS resources over time.
Use cases:
Key Points:
Evaluate whether resources are compliant with desired configurations.
| Rule Type | Description |
|---|---|
| AWS Managed Rules | 75+ pre-built rules |
| Custom Rules | Defined in Lambda |
Examples:
Rule Triggers:
⚠️ Config Rules does NOT prevent actions (no deny) — only evaluates compliance!
Two notification patterns:
Pattern 1: EventBridge (filtered, action-oriented)
AWS Resources ──► AWS Config ──► NON_COMPLIANT ──► EventBridge ──┬──► Lambda
(monitor) (trigger) ├──► SNS
└──► SQSPattern 2: SNS (all events)
AWS Resources ──► AWS Config ──► All events ──► SNS ──► Admin
(monitor) (config changes, (notification)
compliance state)Use SNS Filtering or client-side filtering for Pattern 2.
Auto-fix non-compliant resources using SSM Automation Documents.
Non-Compliant Resource ──► AWS Config ──► SSM Automation ──► Auto-Remediation
(e.g., expired IAM key) (detect) Document (deactivate key)
(Retries: 5)Remediation Options:
| Option | Description |
|---|---|
| AWS-Managed Documents | Pre-built remediation actions |
| Custom Documents | Your own automation (can invoke Lambda) |
| Remediation Retries | Retry if still non-compliant after auto-fix |
Example:
AWSConfigRemediation-RevokeUnusedIAMUserCredentials → deactivate key⚠️ Exam trap: “Auto-remediate non-compliant resources” → AWS Config + SSM Automation Documents.
⚠️ Exam trap: “Config Rules” = detect/evaluate only. “Auto-remediation” = SSM Automation.
⚠️ Exam trap: Config does NOT prevent actions (no deny) — it only detects non-compliance after the fact.
| Service | Purpose | Question it Answers |
|---|---|---|
| CloudWatch | Performance monitoring, dashboards, alerts, logs | How is my app performing? |
| CloudTrail | API call history, audit | WHO made changes? |
| Config | Configuration compliance, change timeline | Is my resource compliant? How did it change? |
Quick Decision:
"Performance/metrics/dashboard" → CloudWatch
"Who did it? / API calls / audit" → CloudTrail
"Is it compliant? / config history" → Config| Service | ELB Use Case |
|---|---|
| CloudWatch | Monitor incoming connections, visualize error codes %, dashboard for performance |
| Config | Track SG rules, track config changes, ensure SSL certificate always assigned (compliance) |
| CloudTrail | Track WHO made changes to the Load Balancer (API calls) |
⚠️ Exam trap: This comparison is exam-favorite! Remember:
| Wrong Answer | Why Wrong |
|---|---|
| ❌ Lambda polling logs hourly | Expensive compute, not real-time, over-engineered |
| ❌ AWS Config Rule | Config monitors resource configuration, not log content |
| ❌ CloudTrail | CloudTrail logs API calls, not application logs |
| ✅ Metric Filter + Alarm | Built-in, near real-time, cost-effective |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ CloudWatch default metrics | RAM is NOT a default metric (only CPU, Disk, Network) |
| ❌ Enable detailed monitoring | Detailed = 1-minute instead of 5-minute, still no RAM |
| ❌ CloudTrail | CloudTrail is for API audit, not metrics |
| ✅ CloudWatch Unified Agent | Required for OS-level metrics (RAM, processes, disk IO) |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ CloudWatch Logs | Logs application output, not API calls |
| ❌ AWS Config | Config tracks config state over time, not who made changes |
| ❌ VPC Flow Logs | Network traffic, not API calls |
| ✅ CloudTrail | Records ALL API calls with user identity |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ CloudTrail | Shows API calls, not current config state or compliance |
| ❌ CloudWatch | Performance metrics, not configuration compliance |
| ❌ IAM Access Analyzer | Analyzes IAM policies, not general resource config |
| ✅ AWS Config | Records config changes + evaluates compliance rules |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ AWS Config Rules | Config detects after the fact, doesn’t prevent |
| ❌ CloudTrail | Audit only, no enforcement |
| ✅ SCPs | Prevent at Organization level |
| ✅ IAM Policies | Prevent at user/role level |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ Config Rules alone | Rules only detect, don’t fix |
| ❌ CloudWatch Alarms | Alarms alert, don’t remediate config |
| ❌ Lambda (without trigger) | No automatic invocation mechanism |
| ✅ Config + SSM Automation | Config detects → SSM Document remediates |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ S3 Export (CreateExportTask) | Batch only, up to 12 hours latency |
| ❌ CloudWatch Logs Insights | Query engine, not real-time stream |
| ✅ Subscription Filters | Real-time to Lambda/Kinesis |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ CloudWatch Logs | Shows individual service logs, not request path |
| ❌ CloudWatch Metrics | Shows aggregated metrics, not individual traces |
| ❌ VPC Flow Logs | Network packets, not application-level tracing |
| ✅ X-Ray | Traces requests across services, shows latency per hop |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ CloudWatch Alarms | Only for metric thresholds, not AWS events |
| ❌ SNS alone | Needs something to trigger it |
| ❌ Lambda scheduled | Polling, not event-driven |
| ✅ EventBridge | Native integration with 100+ AWS services |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ Increase CloudTrail retention | Not configurable, always 90 days in console |
| ❌ CloudWatch Logs | Different service, not where CloudTrail stores |
| ✅ S3 + Athena | S3 for storage, Athena for SQL queries |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ CloudWatch Alarm | Alarms are for metrics, not API events |
| ❌ Config Rule | Config checks compliance, not individual API calls |
| ❌ GuardDuty | Threat detection, not general API alerting |
| ✅ CloudTrail + EventBridge + SNS | CloudTrail logs → EventBridge triggers → SNS alerts |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ CloudWatch Metrics | Metrics = performance (CPU, network) — not config state |
| ❌ CloudTrail | Logs API calls (who changed SG) — not current config state |
| ❌ Lambda on schedule | Works but over-engineered — Config has built-in rules |
| ✅ Config Rules | Continuously monitors Security Group configurations |
Example Config Rules for Security Groups:
restricted-ssh — checks if SSH (port 22) is restrictedrestricted-common-ports — checks for unrestricted access on common portsvpc-sg-open-only-to-authorized-ports — custom rule for specific ports| Wrong Answer | Why Wrong |
|---|---|
| ❌ Config Rules | Evaluate compliance — doesn’t send notifications itself |
| ❌ Config Remediations | Auto-fix resources — not for alerting |
| ✅ Config Notifications | Send alerts via SNS on config changes |
Config Features Quick Reference:
| Feature | Purpose |
|---|---|
| Config Rules | EVALUATE — is it compliant? |
| Config Notifications | ALERT — send SNS/email |
| Config Remediations | FIX — auto-correct via SSM |
| Wrong Answer | Why Wrong |
|---|---|
| ❌ Lambda → S3 | Over-engineered — built-in feature exists |
| ❌ Lambda → DynamoDB | Not designed for event replay |
| ❌ Kinesis Firehose → S3 | Extra service, no native replay |
| ✅ EventBridge Archive and Replay | Native feature, cost-effective, replay to any event bus |
| If Question Says | NOT This | Use This Instead |
|---|---|---|
| “Monitor RAM/memory” | Default CW metrics | Unified Agent |
| “Who did it / audit” | CloudWatch, Config | CloudTrail |
| “Is it compliant” | CloudTrail | Config |
| “Prevent action” | Config Rules | SCPs / IAM |
| “Real-time logs” | S3 Export | Subscriptions |
| “Log keyword alert” | Lambda polling, Config | Metric Filter |
| “Distributed tracing” | CW Logs, VPC Flow | X-Ray |
| “React to events” | CW Alarms | EventBridge |
| “Auto-fix non-compliant” | Config alone | Config + SSM |
| “Port/SG exposed” | CloudWatch, CloudTrail | Config Rules |
| “Store/replay events” | Lambda+S3, DynamoDB | EventBridge Archive |
| “Monitor website/API” | CloudWatch Metrics | Synthetics Canaries |
| “A/B testing / feature flags” | Lambda, custom code | Evidently |
| “AWS outage affects me” | Service Health | Your Account Health + EventBridge |
WHY: AWS separates concerns into three distinct services because each answers a fundamentally different question:
| Service | Question | Data Type |
|---|---|---|
| CloudWatch | “How is it performing?” | Metrics, logs, dashboards |
| CloudTrail | “Who did what?” | API call history |
| Config | “Is it compliant?” | Resource configuration state |
Application: When you see keywords, map to the pillar:
WHY: Exam tests whether you know which service provides real-time data vs batch processing.
| Need | NOT This (Batch) | Use This (Real-time) |
|---|---|---|
| Log processing | S3 Export (12h) | Subscription Filters |
| Metrics streaming | Pull from API | Metric Streams |
| Event reaction | Scheduled Lambda | EventBridge |
WHY: EC2 hypervisor can only see certain metrics. OS-level metrics require an agent.
| Metric Type | Default (Hypervisor) | Agent Required |
|---|---|---|
| CPU | ✅ | - |
| Network | ✅ | - |
| Disk (high-level) | ✅ | - |
| RAM/Memory | ❌ | ✅ Unified Agent |
| Processes | ❌ | ✅ Unified Agent |
| Disk IO detailed | ❌ | ✅ Unified Agent |
WHY: Config ONLY detects. It cannot prevent or fix.
PREVENT DETECT REMEDIATE
│ │ │
▼ ▼ ▼
SCPs/IAM ───► Config Rules ───► SSM Automation
(before) (after the fact) (auto-fix)Application:
WHY: CloudTrail console only keeps 90 days. For long-term, you MUST export.
CloudTrail (90 days) ──► S3 (unlimited) ──► Athena (query)WHY: X-Ray traces request flow across services. It’s NOT for:
Application: “Debug latency in microservices” or “find bottleneck between services” → X-Ray
WHY: EventBridge connects AWS services, custom apps, and SaaS. CloudWatch Alarms only handle metric thresholds.
| Event Type | Service |
|---|---|
| Metric crosses threshold | CloudWatch Alarm |
| AWS service state change | EventBridge |
| API call made | CloudTrail → EventBridge |
| Scheduled task | EventBridge Scheduler |
WHY: They solve different problems:
| Service | Perspective | Use Case |
|---|---|---|
| Synthetics Canaries | External (customer view) | Is my site/API up? |
| X-Ray | Internal (developer view) | Where’s the bottleneck? |
START: What does the question ask about?
│
├─► "Who did it?" / "audit" / "API history"
│ └─► CloudTrail
│
├─► "Is it compliant?" / "configuration state" / "rules"
│ └─► Config
│
├─► "Performance" / "metrics" / "dashboard" / "alarm"
│ └─► CloudWatch
│ │
│ ├─► "RAM/memory on EC2" → Unified Agent
│ ├─► "Keyword in logs" → Metric Filter + Alarm
│ ├─► "Real-time log stream" → Subscription Filter
│ └─► "Stream metrics to S3" → Metric Streams
│
├─► "Trace requests" / "microservices debug" / "latency between services"
│ └─► X-Ray
│
├─► "React to event" / "automate on state change"
│ └─► EventBridge
│ │
│ ├─► "Store/replay events" → Archive and Replay
│ └─► "Cross-account events" → Resource-based Policy
│
├─► "Monitor website/API availability"
│ └─► Synthetics Canaries
│
├─► "Feature flags" / "A/B testing" / "gradual rollout"
│ └─► Evidently
│
└─► "AWS service issue affecting me"
└─► Health Dashboard + EventBridge| Service | CANNOT Do |
|---|---|
| CloudWatch default metrics | Monitor RAM/memory |
| Config Rules | Prevent resource creation (only detect after) |
| CloudTrail | Keep logs beyond 90 days (without S3) |
| S3 Export (logs) | Real-time processing (up to 12h delay) |
| CloudWatch Alarms | React to AWS service events (only metrics) |
| X-Ray | Show aggregated metrics (only traces) |
| Metric Filters | Filter BEFORE logs arrive (only after) |
| Insight Type | What it Monitors | Key Feature |
|---|---|---|
| Container Insights | ECS/EKS/Fargate | Container metrics + logs |
| Lambda Insights | Lambda functions | Cold starts, memory |
| Contributor Insights | Log data | Top-N talkers |
| Application Insights | Java/.NET apps | SageMaker-powered dashboards |
| Destination | Latency | Use Case |
|---|---|---|
| S3 Export | Up to 12 hours | Archive, compliance |
| Subscription → Lambda | Real-time | Transform, OpenSearch |
| Subscription → Firehose | Near real-time | S3, Redshift delivery |
| Subscription → Kinesis | Real-time | Custom analytics |
| Event Type | Default | Examples |
|---|---|---|
| Management Events | ✅ ON | IAM, EC2, CloudTrail config |
| Data Events | ❌ OFF | S3 object-level, Lambda Invoke |
| Insights Events | ❌ OFF | Unusual activity detection |
| Question Contains | → Instant Answer |
|---|---|
| “Monitor RAM/memory EC2” | Unified Agent |
| “Who deleted resource” | CloudTrail |
| “API call history” | CloudTrail |
| “Is resource compliant” | Config |
| “Track config changes over time” | Config |
| “Prevent non-compliant creation” | SCPs / IAM Policies |
| “Auto-remediate non-compliant” | Config + SSM Automation |
| “Alert on log keyword” | Metric Filter + Alarm |
| “Real-time log processing” | Subscription Filters |
| “Stream metrics to S3” | Metric Streams via Firehose |
| “Aggregate logs multi-account” | Subscriptions → Kinesis |
| “Debug microservices” | X-Ray |
| “Trace request across services” | X-Ray |
| “Filter traces by attribute” | X-Ray Annotations |
| “React to AWS service event” | EventBridge |
| “Schedule task (cron)” | EventBridge Scheduler |
| “Store/replay events” | EventBridge Archive |
| “Cross-account event bus” | EventBridge Resource Policy |
| “CloudTrail beyond 90 days” | S3 + Athena |
| “Unusual IAM activity” | CloudTrail Insights |
| “Monitor website/API up” | Synthetics Canaries |
| “A/B testing” | Evidently |
| “Feature flags” | Evidently |
| “Gradual feature rollout” | Evidently |
| “AWS outage affecting me” | Health Dashboard + EventBridge |
| “Top network users in logs” | Contributor Insights |
| “Lambda cold starts” | Lambda Insights |
| “Java/.NET app dashboard” | Application Insights |
| “Container metrics ECS/EKS” | Container Insights |
| “Network to on-premises” | Network Synthetic Monitor |
| “SG port exposure check” | Config Rules |
□ Is it about WHO did something?
→ Yes = CloudTrail
→ No = Continue
□ Is it about COMPLIANCE or configuration state?
→ Yes = Config
→ No = Continue
□ Is it about PREVENTING creation?
→ Yes = SCPs/IAM (Config can't prevent)
→ No = Continue
□ Is it about REAL-TIME log processing?
→ Yes = Subscription Filters (S3 Export has 12h delay)
→ No = Continue
□ Is it about RAM/memory metrics?
→ Yes = Unified Agent (not default metrics)
→ No = Continue
□ Is it about distributed tracing?
→ Yes = X-Ray
→ No = Continue
□ Is it about reacting to AWS events?
→ Yes = EventBridge (not CloudWatch Alarms)
→ No = Continue
□ Is it about website/API monitoring?
→ Yes = Synthetics Canaries
→ No = Continue
□ Is it about feature rollout/A/B testing?
→ Yes = Evidently
→ No = ContinueSecurity Group (Firewall) controls how traffic is allowed into or out of EC2 Instances or other Security Groups. Can be attached to multiple instances. Locked down to a region/VPC combination. Does live “outside” EC2", if traffic blocked, EC2 won’t see it.
Security Group by default denies every inbound traffic and contain only allow rules. All outbound traffic is authorised.
Security Group Rules regulate:
Best practices:
Troubleshooting:
IAM Access Analyzer for S3:
DDoS Protection on AWS:
AWS WAF: filter specific requests based on rules and protects web application from common web exploits (Layer 7). Deploy on Application Load Balancer, API Gateway and CloudFront.
Define Web ACL (Web Access Controll List):
AWS Network Firewall protect entire Amazon VPC (from layer 3 to layer 7). Inspect directions:
AWS Firewall Manager manages security rules in all accounts of an AWS Organization. Rules are applied to new resources as they are created (good for compliance) across all and future accounts in your Organization.
Security policies:
AWS Acceptable use policy:
Eight servics that are allowed without prior approval from AWS to carry out security assessments or penetration tests:
Prohibited Actiities:
AWS Abuse report suspected AWS resources used for abusive or illegal purposes.
Abusive & prohibited behaviors are:
AWS KMS (Key Management Service) service that helps manage the ecryption keys.
Encryption Opt-in:
Encryption Automatically enabled:
AWS CloudHSM (Cloud Hardware Security Module) provisioning encryption hardware. Customer manages all ecryption keys.
CloudHSM vs KMS:
| Feature | AWS KMS | AWS CloudHSM |
|---|---|---|
| Key Management | AWS manages keys | Customer manages keys |
| Access Control | IAM policies + Key policies | You manage users in HSM |
| Hardware | Shared (multi-tenant) | Dedicated hardware (single-tenant) |
| FIPS 140-2 | Level 2 | Level 3 (tamper-evident) |
| High Availability | AWS managed | You must set up cluster across AZs |
| Integration | Native with 100+ AWS services | Custom integration needed |
| Cost | Pay per key + API calls | ~$1.50/hour per HSM |
| Use Case | Most encryption needs | Strict compliance, BYOK, SSL/TLS offload |
Key Insight:
KMS: CloudHSM:
┌─────────────────────────┐ ┌─────────────────────────┐
│ AWS manages │ │ Customer manages │
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │
│ │ Key Material │ │ │ │ Key Material │ │
│ │ (AWS controls) │ │ │ │ (YOU control) │ │
│ └────────▲────────┘ │ │ └────────▲────────┘ │
│ │ │ │ │ │
│ IAM Policy │ │ HSM Users/Certs │
│ (access control) │ │ (you manage) │
└─────────────────────────┘ └─────────────────────────┘⚠️ Exam trap: “Customer needs to manage their own encryption keys with FIPS 140-2 Level 3” → CloudHSM (KMS is Level 2)
⚠️ Exam trap: “AWS should NOT have access to encryption keys” → CloudHSM (with KMS, AWS manages key material)
⚠️ Exam trap: “Multi-region” + “Global database” + “client-side encryption” → KMS Multi-Region Keys (NOT CloudHSM)
mrk-) = same key ID works across all regions| Scenario | Answer | Why NOT other |
|---|---|---|
| Aurora Global + client-side encryption | KMS Multi-Region Keys | CloudHSM can’t replicate keys across regions |
| FIPS 140-2 Level 3 compliance | CloudHSM | KMS is Level 2 only |
| AWS must NOT access keys | CloudHSM | KMS = AWS manages key material |
Type of KMS Keys (based on creating, managing, using rotaion policies):
AWS Certificate Manager (ACM) is a managed service to provision, manage, and deploy public and private SSL/TLS certificates with AWS services and internal connected resources. Intergrated with Elastic Load Balancer, CloudFront Distributions, APIs on API Gateway.
AWS Secrets Manager helps you manage, retrieve, and rotate database credentials, API keys, and other secrets throughout their lifecycles. Integrated with AWS Lambda, AWS RDS (MySQL, PostgreSQL, Aurora).
Rotation sercrets is the process of periodically updating a secret. When you rotate a secret, you update the credentials in both the secret and the database or service. In Secrets Manager, you can set up automatic rotation for your secrets.
AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, and license codes as parameter values.
Parameter Store doesn’t provide automatic rotation services for stored secrets.
Amazon GuardDuty intelligent (uses Machine Learning) threat discovery to protect your AWS account.
Input data includes:
Amazon Inspector automatically discovers workloads, such as Amazon EC2 instances, containers, and Lambda functions, and scans them for software vulnerabilities and unintended network exposure.
AWS Config is a config tool that helps you assess, audit, and evaluate the configurations and relationships of your resources. Possibility of storing the configuration data into S3 (analyzed by Athena) and recieving alerts (SNS notifications) for any changes. Per-region service, but can be aggregated across regions and accounts.
AWS Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. Macie helps identify and alert you to sensitive data, such as personally identifiable information (PII).
AWS Detective analyzes, investigates and quickly identifies the root cause of security issues or suspicious activities (using ML and graphs). Automatically collects and processes events from VPC Flow Logs, CloudTrail, GuardDuty and create unified view.
AWS Security Hub central security tool to manage security across several AWS accounts and automate security checks. Integrated dashboards showing current security and compliance status to quickly take actions. Not enabled by default
Autmatically aggregates alerts:
AWS Artifact (not really a service) portal that provides customers with on-demand access to AWS compliance documentation and AWS agreements. Can be used to support internal audit or compliance.
IAM Access Analyzer: finds out which resources are shared externally of defined Zone of Trust (AWS account or AWS Organization):
| Type | Where Encrypted | Who Has Keys | Use Case |
|---|---|---|---|
| Encryption in Flight (TLS/SSL) | During transmission | TLS certificates | Protect data in transit, prevent MITM attacks |
| Server-Side Encryption (SSE) | At rest, on server | Server (AWS manages) | S3, EBS, RDS - data protected at rest |
| Client-Side Encryption | Before sending | Client only | Server should NOT see plaintext (zero-trust) |
Client-Side Encryption:
┌──────────┐ ┌─────────────────┐
│ Client │ encrypted data │ Storage (S3) │
│ ┌──────┐ │ ─────────────────────────► │ │
│ │ Key │ │ │ Encrypted blob │
│ └──────┘ │ ◄───────────────────────── │ (can't decrypt) │
└──────────┘ encrypted data └─────────────────┘
Server-Side Encryption:
┌──────────┐ plaintext ┌─────────────────────────────────┐
│ Client │ ────────────► │ AWS Service (S3) │
│ │ HTTPS │ ┌─────┐ encrypt ┌────────┐ │
│ │ │ │ Key │ ─────────► │ Stored │ │
│ │ ◄──────────── │ └─────┘ decrypt └────────┘ │
└──────────┘ plaintext └─────────────────────────────────┘⚠️ Exam trap: “[Service] Client-side Encryption” terminology
When question says “data must not be disclosed even by company admins”: → Client-side encryption (service stores only ciphertext, can’t decrypt)
AWS KMS manages encryption keys for AWS services. Anytime you hear “encryption” for an AWS service, it’s most likely KMS.
KMS Key Types:
| Key Type | Description | Access to Key Material |
|---|---|---|
| Symmetric (AES-256) | Single key for encrypt/decrypt | Never (must use KMS API) |
| Asymmetric (RSA/ECC) | Public + Private key pair | Public key downloadable, private never |
Asymmetric Key Usage (IMPORTANT):
| Key Type | Key Usage | Can Do | Cannot Do |
|---|---|---|---|
| RSA | ENCRYPT_DECRYPT | Encrypt, Decrypt | Sign, Verify |
| RSA | SIGN_VERIFY | Sign, Verify | Encrypt, Decrypt |
| ECC | SIGN_VERIFY only | Sign, Verify | Encrypt, Decrypt (never!) |
⚠️ Exam trap: “Asymmetric key for encryption AND signing” → IMPOSSIBLE with single key. Need TWO separate keys.
KMS Key Ownership & Pricing:
| Key Type | Cost | Example | Rotation |
|---|---|---|---|
| AWS Owned | Free | SSE-S3, SSE-SQS, SSE-DDB | AWS manages |
| AWS Managed | Free | aws/rds, aws/ebs | Auto every 1 year |
| Customer Managed (created) | $1/month + API calls | Your keys | Must enable, auto every 1 year |
| Customer Managed (imported) | $1/month + API calls | BYOK | Manual only (use alias) |
KMS Key Rotation Deep Dive:
| Key Type | Auto Rotation | Period | Notes |
|---|---|---|---|
| AWS Managed | ✅ Always ON | 1 year | Cannot disable |
| Customer Managed | Optional (must enable) | 1 year | On-demand also available |
| Imported | ❌ Not available | N/A | Manual only via alias |
How rotation works:
Before rotation: After rotation:
┌─────────────────┐ ┌─────────────────┐
│ Key ID: abc-123 │ │ Key ID: abc-123 │ ◄── Same ID!
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ Key Material│ │ │ │OLD Material │ │ ◄── Kept for decrypt
│ │ (v1) │ │ │ │(v1) │ │
│ └─────────────┘ │ │ ├─────────────┤ │
└─────────────────┘ │ │NEW Material │ │ ◄── Used for encrypt
│ │(v2) │ │
│ └─────────────┘ │
└─────────────────┘⚠️ Exam trap: Rotation period = 1 year FIXED (cannot be changed to 90 days, 6 months, etc.)
⚠️ Exam trap: Imported keys can ONLY be rotated manually (no automatic rotation)
Manual Rotation (for custom rotation periods):
If policy requires rotation more frequently than 1 year (e.g., 6 months):
6 months ago: Now (after manual rotation):
┌─────────────────┐ ┌─────────────────┐
│ Alias: my-key │──────────┐ │ Alias: my-key │──────────┐
└─────────────────┘ │ └─────────────────┘ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ CMK-OLD │ │ CMK-NEW │
│ (key-111) │ │ (key-222) │
└─────────────┘ └─────────────┘
│ │
│ Still exists! │
│ (decrypt old data) │ (new encryptions)
▼ ▼⚠️ Exam trap: “Rotate every 6 months” → Manual rotation with aliases (auto rotation is 1 year only, cannot configure)
Wrong answers explained:
KMS Access Control (Key Policies):
Key Policy = PRIMARY way to control access to KMS keys (resource-based policy).
| Access Method | Description | Required? |
|---|---|---|
| Key Policy | Resource-based policy ON the key | ✅ Always required |
| IAM Policy | Identity-based policy on user/role | Optional (works WITH key policy) |
| Grants | Temporary, delegated access | Optional |
Critical difference from other AWS services:
S3 Access: KMS Access:
IAM Policy ──► S3 Bucket IAM Policy ──┐
│ │
└─► Access granted! ▼
Key Policy ──► KMS Key
│
└─► BOTH needed!Default Key Policy:
Custom Key Policy - use cases:
⚠️ Exam trap: “KMS IAM Policy” alone → NOT enough! Key Policy is required. IAM policies work only if Key Policy allows it.
⚠️ Exam trap: “KMS ACL” → Does NOT exist! (Unlike S3, KMS has no ACLs)
KMS Grants:
Copying Snapshots Across Accounts:
Copying Snapshots Across Regions:
KMS ReEncrypt to change encryption key during copyRegion A (eu-west-2) Region B (ap-southeast-2)
┌─────────────┐ ┌─────────────┐
│ EBS Volume │ │ EBS Volume │
│ (KMS Key A) │ │ (KMS Key B) │
└──────┬──────┘ └──────▲──────┘
│ │
▼ │
┌─────────────┐ ReEncrypt with ┌─────────────┐
│ Snapshot │ KMS Key B │ Snapshot │
│ (Key A) │ ─────────────────► │ (Key B) │
└─────────────┘ └─────────────┘KMS Multi-Region Keys:
arn:aws:kms:<region>:111122223333:key/mrk-... (note mrk- prefix)⚠️ Exam trap: “The same KMS key cannot exist in two regions” → FALSE with Multi-Region keys. Regular KMS keys are regional, but Multi-Region keys CAN exist in multiple regions with same key ID.
┌─────────────────┐
│ us-west-2 │
│ Replica Key │
│ mrk-1234... │
└────────▲────────┘
│ sync
┌─────────────────┐ │ ┌─────────────────┐
│ us-east-1 │──────────┴──────────│ eu-west-1 │
│ PRIMARY Key │ sync │ Replica Key │
│ mrk-1234... │─────────────────────│ mrk-1234... │
└─────────────────┘ └─────────────────┘Multi-Region Key Use Cases:
⚠️ Exam trap: Multi-Region keys are NOT “global keys” - each replica is managed independently in its region
AMI Sharing with KMS Encryption:
When sharing encrypted AMI across accounts, you must share BOTH the AMI AND the KMS key access.
Account A (Source) Account B (Target)
┌────────────────────────────────┐ ┌────────────────────────────────┐
│ │ │ │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ AMI │ │ │ │ EC2 Instance │ │
│ │ (encrypted) │──────────────┼──────►│──────────────│ (launched) │ │
│ └──────┬───────┘ Share AMI │ │ Launch └──────────────┘ │
│ │ │ │ ▲ │
│ │ encrypted with │ │ │ │
│ ▼ │ │ │ │
│ ┌──────────────┐ │ │ uses key to │
│ │ KMS Key │──────────────┼──────►│──────────────decrypt │
│ │ (CMK) │ Share Key │ │ │
│ └──────────────┘ (Key Policy)│ │ │
│ │ │ │
└────────────────────────────────┘ └────────────────────────────────┘Steps to share encrypted AMI:
LaunchPermission to target account⚠️ Exam trap: Cannot share AMI encrypted with AWS Managed Key (aws/ebs) - must use Customer Managed Key
| Encryption Type | Replication Behavior |
|---|---|
| Unencrypted | Replicated by default |
| SSE-S3 | Replicated by default |
| SSE-C (customer provided key) | Can be replicated |
| SSE-KMS | Must enable option explicitly |
SSE-KMS Replication Requirements:
kms:Decrypt (source key) + kms:Encrypt (target key)⚠️ Exam trap: Multi-Region KMS keys are treated as independent keys by S3 - object is still decrypted then re-encrypted (no optimization)
AWS Secrets Manager stores and manages secrets with automatic rotation.
Multi-Region Secrets:
us-east-1 (Primary) us-west-2 (Secondary)
┌─────────────────┐ replicate ┌─────────────────┐
│ Secrets Manager │ ─────────────────► │ Secrets Manager │
│ MySecret-A │ │ MySecret-A │
│ (primary) │ │ (replica) │
└─────────────────┘ └─────────────────┘SSM Parameter Store vs Secrets Manager:
| Feature | SSM Parameter Store | Secrets Manager |
|---|---|---|
| Cost | Free tier (Standard), charges for Advanced | $0.40/secret/month + API calls |
| Auto Rotation | ❌ No | ✅ Yes (built-in Lambda) |
| RDS Integration | Manual | ✅ Native (MySQL, PostgreSQL, Aurora) |
| KMS Encryption | Optional (SecureString) | ✅ Always encrypted |
| Hierarchy | ✅ Path-based (/app/dev/db-password) | ❌ Flat |
| Multi-Region | ❌ No | ✅ Yes (replicas) |
| Version Tracking | ✅ Built-in | ✅ Built-in |
| Pull from CF/CDK | ✅ Direct reference | ✅ Direct reference |
SSM Parameter Store - Version Tracking:
aws ssm get-parameter --name /my/paramaws ssm get-parameter --name /my/param:3Parameter: /app/db-password
┌─────────────────────────────────────────┐
│ Version 1: "oldpass123" (2024-01-01) │
│ Version 2: "newpass456" (2024-06-01) │
│ Version 3: "latestpass789" (2025-01-01) │ ◄── Current
└─────────────────────────────────────────┘⚠️ Exam trap: “Track secret values over time” → SSM Parameter Store (built-in versioning)
⚠️ Exam trap: “KMS Versioning” → Does NOT exist! KMS has key rotation (new key material), not value versioning
Where to Store Configuration/Secrets - Decision Guide:
| Requirement | Best Service | Why NOT others |
|---|---|---|
| Config values + version history | SSM Parameter Store | DynamoDB (overkill), S3 (not designed for this), EBS (storage volume) |
| DB credentials + auto rotation | Secrets Manager | SSM (no auto rotation), KMS (encryption only) |
Hierarchical config (/app/prod/db) | SSM Parameter Store | Secrets Manager (flat structure) |
| Sensitive + multi-region | Secrets Manager | SSM (no multi-region) |
⚠️ Exam trap: “RDS password + automatic rotation”
Why SSM Parameter Store for “externally maintain config”:
Wrong answers explained:
When to use which:
⚠️ Exam trap: “Automatic rotation for DB credentials” → Secrets Manager (Parameter Store has NO auto rotation)
Lambda + Secrets - Security Options (worst to best):
| Option | Security Level | Why |
|---|---|---|
| ❌ Embed in code | WORST | Visible in source control, logs, anyone with code access |
| ❌ Plaintext env var | BAD | Visible in Lambda console, CloudWatch logs |
| ✅ Encrypted env var + KMS | GOOD | Encrypted at rest, decrypted at runtime |
| ✅✅ Secrets Manager/SSM | BEST | Centralized, audit trail, rotation, no env vars |
Encrypted Environment Variable Flow:
1. Store secret as encrypted env var (using KMS)
┌─────────────────────────────────────────┐
│ Lambda Config │
│ DB_PASSWORD = AQICAHh...encrypted... │
└─────────────────────────────────────────┘
│
2. At runtime, Lambda decrypts using KMS
│
▼
┌──────────┐ decrypt ┌──────────┐
│ Lambda │ ────────────► │ KMS │
│ code │ ◄──────────── │ CMK │
└──────────┘ plaintext └──────────┘
│
3. Use decrypted value to connect to DB
│
▼
┌──────────┐
│ RDS │
└──────────┘Why encrypted env var is “most secure” in the question:
kms:Decrypt permission on the CMK⚠️ Exam context: If Secrets Manager is an option, it’s usually the BEST answer (centralized + rotation + audit). But among the 3 options given, encrypted env var wins.
ACM provisions, manages, and deploys TLS/SSL certificates.
ACM Integrations:
| Service | Notes |
|---|---|
| ELB (CLB, ALB, NLB) | Provision certs directly |
| CloudFront | Must be in us-east-1 |
| API Gateway | Edge-optimized or Regional |
⚠️ Exam trap: Cannot use ACM with EC2 directly (private key can’t be extracted)
ACM + API Gateway - Certificate Region Rules:
| API Gateway Type | Certificate Location |
|---|---|
| Edge-Optimized | ACM cert must be in us-east-1 (CloudFront region) |
| Regional | ACM cert must be in same region as API Gateway |
Memory trick: “Where does TLS terminate?”
API Gateway Endpoint Types Explained:
| Type | Audience | How It Works | ACM Region |
|---|---|---|---|
| Edge-Optimized (default) | Global clients | Requests routed via CloudFront edge locations → reduces latency | us-east-1 only |
| Regional | Same-region clients | Direct access, can add your own CloudFront for more control | Same as API Gateway |
| Private | VPC only | Access via VPC Interface Endpoint (ENI) | Same as API Gateway |
Edge-Optimized (default):
Regional:
Edge-Optimized: Regional:
┌─────────────┐ ┌─────────────────┐
│ us-east-1 │ │ ap-southeast-2 │
│ ┌─────────┐ │ │ ┌─────────────┐ │
│ │ ACM │─┼──► CloudFront │ │ API Gateway │ │
│ └─────────┘ │ (AWS managed) │ └──────┬──────┘ │
└─────────────┘ │ │ │ │
▼ │ ┌──────▼──────┐ │
┌─────────────┐ │ │ ACM │ │
│ API Gateway │ │ │ (same rgn) │ │
│ (any region)│ │ └─────────────┘ │
└─────────────┘ └─────────────────┘Regional + Custom CloudFront (more DDoS control):
┌─────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ us-east-1 │ │ ap-southeast-2 │ │ ap-southeast-2 │
│ ┌─────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ ACM │─┼────►│ │ CloudFront │─┼────►│ │ API Gateway │ │
│ │(for CF) │ │ │ │(your own) │ │ │ │ Regional │ │
│ └─────────┘ │ │ └─────────────┘ │ │ └─────────────┘ │
└─────────────┘ └─────────────────┘ └─────────────────┘
+ WAF attached here⚠️ Exam trap: Edge-Optimized uses CloudFront but certificate must be in us-east-1, NOT in the API Gateway’s region
Route 53 Setup:
ACM + ALB - HTTP to HTTPS Redirect:
User ──► HTTP ──► ALB ──► Redirect to HTTPS
◄── 301 ◄──────┘
User ──► HTTPS ──► ALB ──► EC2 (Auto Scaling)
│
▼
ACM (provision/maintain certs)Importing Public Certificates:
acm-certificate-expiration-check⚠️ Exam trap — ACM certificate expiry monitoring (created vs imported):
| Feature | ACM-Created Certs | Imported (Third-Party) Certs |
|---|---|---|
| Auto-renewal | ✅ Yes (60 days before) | ❌ No — must manually re-import |
CW DaysToExpiry metric | ✅ Yes | ❌ No |
| EventBridge events | ✅ Yes (daily, 45 days before) | ✅ Yes (daily, 45 days before) |
| AWS Config rule | ✅ Works | ✅ Works — best for imported |
acm-certificate-expiration-check → SNSDaysToExpiry alarmDaysToExpiry for imported certs → metric only exists for ACM-created certs!AWS WAF protects web apps from Layer 7 (HTTP) exploits.
Deploy on:
Web ACL Rules:
| Rule Type | Description |
|---|---|
| IP Set | Up to 10,000 IPs (use multiple rules for more) |
| String match | HTTP headers, body, URI strings |
| SQL injection | Block SQLi attacks |
| XSS | Block Cross-Site Scripting |
| Size constraints | Limit request size |
| Geo-match | Block countries |
| Rate-based | DDoS protection (count events) |
WAF + Fixed IP (Load Balancer):
⚠️ Exam trap: “Attach WAF to NLB” → IMPOSSIBLE! WAF = Layer 7 (HTTP), NLB = Layer 4 (TCP/UDP). Use ALB instead, or put Global Accelerator in front for fixed IPs.
WAF-Compatible Services:
| ✅ Supported | ❌ NOT Supported |
|---|---|
| ALB | NLB |
| API Gateway | EC2 directly |
| CloudFront | Route 53 |
| AppSync | CLB (Classic) |
| Cognito User Pool |
Users ──► Global Accelerator ──► ALB ◄── WAF (WebACL)
(Fixed IP: 1.2.3.4) │ (same region)
▼
EC2 Instances| Service | Use Case | Scope |
|---|---|---|
| WAF | Granular protection, Web ACL rules | Single resource |
| Firewall Manager | Manage WAF across accounts, auto-protect new resources | AWS Organization |
| Shield Advanced | DDoS protection, SRT support, cost protection | Enhanced DDoS |
Decision Guide:
| Feature | Shield Standard | Shield Advanced |
|---|---|---|
| Cost | Free (all customers) | $3,000/month/org |
| Layer | Layer 3/4 | Layer 3/4/7 |
| Protection | SYN/UDP floods, reflection | + sophisticated attacks |
| Resources | All | EC2, ELB, CloudFront, Global Accelerator, Route 53 |
| DDoS Response Team | ❌ | ✅ 24/7 access to DRP |
| Cost Protection | ❌ | ✅ (no higher fees during attack) |
| Auto WAF rules | ❌ | ✅ (creates rules for L7 attacks) |
Firewall Manager manages security rules across all accounts in AWS Organization.
Manages:
WAF rules (ALB, API Gateway, CloudFront)
AWS Shield Advanced
Security Groups (EC2, ALB, ENI in VPC)
AWS Network Firewall (VPC level)
Route 53 Resolver DNS Firewall
Policies created at region level
Rules applied to new resources automatically (compliance)
⚠️ Exam keywords → Firewall Manager:
Why NOT others for “centrally manage across accounts”:
AWS DDoS Best Practices Reference Architecture:
AWS Edge Services
┌─────────────────────────────────────────────────────────────────────┐
│ BP1: Global Accelerator BP3: Route 53 BP1/BP2: CloudFront │
│ (fixed IPs, Shield) (DNS at edge) (cache + WAF) │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Region │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ VPC (BP5) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Public │ │ BP6: ELB │ │ Private Subnet │ │ │
│ │ │ Subnet │───►│ + WAF (BP2) │───►│ BP7: Auto │ │ │
│ │ │ (NACLs) │ │ + API GW │ │ Scaling Group │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Security Groups │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘BP Summary Table:
| BP | Service | Layer | Purpose |
|---|---|---|---|
| BP1 | CloudFront, Global Accelerator | Edge | Absorb DDoS at edge, reduce origin load |
| BP2 | WAF | L7 | Filter malicious requests, rate limiting |
| BP3 | Route 53 | DNS | DNS at edge, shuffle sharding, health checks |
| BP4 | API Gateway | L7 | Hide backend, burst limits, API keys |
| BP5 | VPC (SG + NACL) | L3/L4 | Filter IPs at subnet/ENI level |
| BP6 | ELB | L4/L7 | Distribute traffic, scales automatically |
| BP7 | Auto Scaling | Infra | Scale EC2 during traffic surges |
1. Edge Location Mitigation (BP1, BP3):
Internet ──► CloudFront (BP1) ──► Origin
│
├─ Caches static content (reduces origin requests)
├─ Absorbs L3/L4 attacks (SYN floods, UDP reflection)
└─ Geo-blocking available
Internet ──► Global Accelerator (BP1) ──► ALB/NLB/EC2
│
├─ Fixed Anycast IPs (2 IPs)
├─ Routes via AWS backbone (not public internet)
├─ Shield integration
└─ Use when CloudFront not compatible (non-HTTP)
Internet ──► Route 53 (BP3) ──► Your resources
│
├─ DNS resolution at edge
├─ Built-in DDoS protection
└─ Health checks + failoverWhen to use which:
2. Infrastructure Layer Defense (BP1, BP3, BP6, BP7):
DDoS Attack
│
▼
┌─────────────────────────────────────────┐
│ Edge Services │
│ (absorb volumetric attacks) │
└────────────────────┬────────────────────┘
│ reduced traffic
▼
┌─────────────────────────────────────────┐
│ ELB (BP6) - scales auto │
│ (distributes across instances) │
└────────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Auto Scaling Group (BP7) │
│ (adds instances during surge) │
│ ┌────┐ ┌────┐ ┌────┐ ┌────┐ │
│ │EC2 │ │EC2 │ │EC2 │ │EC2 │ │
│ └────┘ └────┘ └────┘ └────┘ │
└─────────────────────────────────────────┘Key point: ELB + Auto Scaling = absorb legitimate traffic surges AND DDoS
3. Application Layer Defense (BP1, BP2):
Malicious Request ──► CloudFront ──► WAF (BP2) ──► ALB ──► App
│ │
│ ├─ SQL injection? BLOCK
│ ├─ XSS? BLOCK
│ ├─ Rate > 2000/5min? BLOCK IP
│ ├─ Bad IP reputation? BLOCK
│ └─ Geo = blocked country? BLOCK
│
└─ Cached? Return from edge (origin never hit)WAF Rules for DDoS:
Shield Advanced (BP1, BP2, BP6):
4. Attack Surface Reduction (BP1, BP4, BP5, BP6):
Attacker
│
▼
┌─────────────────┐
│ CloudFront │ ◄── Only this IP is public
│ (or API GW) │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
❌ Can't reach ❌ Can't reach ❌ Can't reach
EC2 IPs Lambda RDS
directly directly directlyObfuscation = hide your backend:
Security Groups + NACLs (BP5):
GuardDuty is intelligent threat discovery using ML.
Core Data Sources (always analyzed):
| ✅ Source | What It Detects |
|---|---|
| CloudTrail Management Events | Unusual API calls, create VPC, create trail |
| CloudTrail S3 Data Events | Get/list/delete object anomalies |
| VPC Flow Logs | Unusual traffic, suspicious IPs |
| DNS Logs | Compromised EC2 sending encoded DNS queries |
Optional Features: EKS Audit Logs, RDS & Aurora login, EBS, Lambda, S3 Data Events
NOT a GuardDuty data source:
| ❌ NOT Scanned | Why it’s a trap |
|---|---|
| CloudWatch Logs | Common confusion - GuardDuty uses its own log analysis |
| Application logs | GuardDuty = infrastructure threats, not app logs |
| Custom logs | Not supported |
⚠️ Exam trap: “GuardDuty scans CloudWatch Logs” → FALSE! GuardDuty scans CloudTrail, VPC Flow Logs, DNS Logs (not CloudWatch Logs)
Memory hook - GuardDuty sources: “CVD”
┌─────────────────┐
│ VPC Flow Logs │──┐
├─────────────────┤ │ ┌───────────┐ ┌─────────────┐
│ CloudTrail Logs │──┼────►│ GuardDuty │────►│ EventBridge │──► SNS/Lambda
├─────────────────┤ │ └───────────┘ └─────────────┘
│ DNS Logs │──┘
└─────────────────┘
+ Optional: S3, EBS, Lambda, RDS, EKS
❌ NOT: CloudWatch LogsInspector performs automated security assessments.
Scans:
| Target | What’s Scanned | Requires |
|---|---|---|
| EC2 instances | OS vulnerabilities, network reachability | SSM Agent |
| ECR Container Images | Vulnerabilities on push | - |
| Lambda Functions | Code vulnerabilities, package dependencies | - |
Lambda ──────┐
│
SSM Agent ───┼────► Inspector ────► Security Hub
(EC2) │ │ EventBridge
│ ▼
ECR Images ──┘ Findings + Risk ScoreGuardDuty vs Inspector vs Macie vs Config:
| Service | What It Does | Looks At | Use Case |
|---|---|---|---|
| GuardDuty | Threat detection | CloudTrail, VPC Flow, DNS | “Is someone attacking me?” |
| Inspector | Vulnerability scanning | EC2 OS, ECR images, Lambda | “Do I have unpatched CVEs?” |
| Macie | Sensitive data discovery | S3 buckets | “Do I have exposed PII?” |
| Config | Configuration compliance | Resource configs | “Are my resources compliant?” |
⚠️ Exam trap keywords:
Wrong answers for “OS vulnerabilities” question:
Macie discovers and protects sensitive data using ML and pattern matching.
S3 Buckets ────► Macie ────► EventBridge ────► integrations
(discover PII) (notify) (Lambda, SNS, etc.)AWS security is about WHO controls the keys and WHERE encryption happens.
Most AWS Control ◄──────────────────────────────────► Most Customer Control
SSE-S3 SSE-KMS SSE-KMS (CMK) SSE-C Client-Side
(AWS owns) (AWS managed) (Customer managed) (Customer key) (Customer encrypts)Key insight: The more control you want, the more responsibility you have.
Unlike S3/Lambda, KMS requires Key Policy — IAM policy alone is NOT enough.
Why? KMS keys are highly sensitive. AWS designed it so you MUST explicitly allow access at the key level.
Derivation: If question mentions “IAM policy for KMS” → check if Key Policy allows it. No Key Policy = No Access.
Understanding where services “live” determines where certificates/keys must be.
| Service | Scope | Certificate/Key Location |
|---|---|---|
| CloudFront | Global (us-east-1) | ACM in us-east-1 |
| API Gateway Edge-Optimized | Uses CloudFront | ACM in us-east-1 |
| API Gateway Regional | Regional | ACM in same region |
| KMS | Regional | Must re-encrypt when crossing regions |
| KMS Multi-Region | Multi-region | Same key ID across regions |
| CloudHSM | Regional | No cross-region replication |
Derivation: “Where does TLS terminate?” → that’s where cert must be.
Security services fall into three categories:
| Category | Services | Action |
|---|---|---|
| Detection | GuardDuty, Inspector, Macie, Config | Find problems |
| Protection | WAF, Shield, Network Firewall | Block attacks |
| Management | Firewall Manager, Security Hub | Centralize/aggregate |
Derivation: “Centrally manage across accounts” → Management category → Firewall Manager
Network attacks happen at different layers:
| Layer | Attacks | Protection |
|---|---|---|
| L3/L4 | SYN floods, UDP reflection | Shield, NACLs, Security Groups |
| L7 | SQL injection, XSS, DDoS | WAF, API Gateway throttling |
Derivation: “NLB + WAF” → IMPOSSIBLE (NLB = L4, WAF = L7)
Don’t confuse these:
| Concept | What Changes | Service |
|---|---|---|
| Key Rotation | New key material, same key ID | KMS (1 year fixed) |
| Secret Rotation | New password/credential | Secrets Manager (configurable) |
| Version History | Track all previous values | SSM Parameter Store |
Derivation: “Rotate every 6 months” → Manual rotation with aliases (KMS auto is 1 year only)
“[Service] Client-Side Encryption” means YOUR APP encrypts before sending to [Service].
Derivation: Client-side encryption question → identify the STORAGE service
Most services do NOT auto-rotate:
| Service | Auto-Rotation? |
|---|---|
| Secrets Manager | ✅ Yes (built-in) |
| KMS | ✅ Yes (1 year only) |
| SSM Parameter Store | ❌ No |
| IAM Access Keys | ❌ No |
Derivation: “DB credentials + auto rotation” → Secrets Manager (only option)
Need encryption?
│
├─► "DB credentials" + "auto rotation"
│ └─► Secrets Manager
│
├─► "Config values" + "version history"
│ └─► SSM Parameter Store
│
├─► "FIPS 140-2 Level 3" OR "AWS cannot access keys"
│ └─► CloudHSM
│
├─► "Multi-region" + "Global DB"
│ └─► KMS Multi-Region Keys (NOT CloudHSM)
│
├─► "Admins cannot see data"
│ └─► Client-Side Encryption
│
└─► Standard encryption
└─► KMS (default choice)Security question?
│
├─► "Threat" / "attack" / "compromised" / "unusual API"
│ └─► GuardDuty
│
├─► "Vulnerability" / "CVE" / "patch" / "OS security"
│ └─► Inspector
│
├─► "PII" / "sensitive data" / "S3 data discovery"
│ └─► Macie
│
├─► "Compliance" / "configuration audit"
│ └─► Config
│
├─► "Centrally manage" / "across accounts" / "Organization"
│ └─► Firewall Manager
│
├─► "DDoS protection"
│ └─► Shield (Standard=free, Advanced=$3k/mo)
│
└─► "Layer 7" / "SQL injection" / "XSS" / "rate limiting"
└─► WAFWhere to put ACM certificate?
│
├─► CloudFront distribution?
│ └─► us-east-1
│
├─► Edge-Optimized API Gateway?
│ └─► us-east-1 (uses CloudFront behind scenes)
│
├─► Regional API Gateway?
│ └─► Same region as API Gateway
│
└─► ALB?
└─► Same region as ALB| ❌ Impossible | Why |
|---|---|
| WAF + NLB | WAF = L7, NLB = L4 |
| ACM + EC2 directly | Can’t extract private key |
| KMS auto-rotate < 1 year | Fixed at 1 year |
| Imported key auto-rotation | Manual only via alias |
| CloudHSM multi-region replication | Single-region only |
| GuardDuty scan CloudWatch Logs | Uses CloudTrail, VPC Flow, DNS only |
| Single asymmetric key for encrypt + sign | Choose one at creation |
| Share AMI with AWS Managed Key | Must use Customer Managed Key |
Keywords: RDS, password, credentials, automatic rotation Answer: Secrets Manager Why: Only service with native RDS rotation integration
Keywords: FIPS, Level 3, compliance, tamper-evident Answer: CloudHSM Why: KMS is Level 2 only; CloudHSM is Level 3
Keywords: AWS cannot access, customer-managed hardware Answer: CloudHSM Why: KMS = AWS manages key material; CloudHSM = you manage entirely
Keywords: Global database, multi-region, client-side, encrypt Answer: KMS Multi-Region Keys Why: CloudHSM can’t replicate keys across regions
Keywords: centrally, manage, multiple accounts, Organization Answer: Firewall Manager Why: Only service that manages security rules across Organization
Keywords: Edge-Optimized, API Gateway, certificate, SSL Answer: us-east-1 Why: Edge-Optimized uses CloudFront → CloudFront = us-east-1
Keywords: certificate expiry, notification, X days before Answer: Depends on cert type:
acm-certificate-expiration-check → SNSDaysToExpiry alarm)
Why: CW DaysToExpiry metric doesn’t exist for imported certs. Config rule works for both.Keywords: fixed IP, static IP, WAF, DDoS Answer: Global Accelerator + ALB + WAF Why: WAF can’t attach to NLB; Global Accelerator provides fixed IPs to ALB
Keywords: vulnerability, CVE, patch, EC2, OS Answer: Inspector (with SSM Agent) Why: Inspector scans for CVEs; GuardDuty detects threats, not vulnerabilities
Keywords: configuration, history, version, changes Answer: SSM Parameter Store (for config values) or Config (for resources) Why: Built-in versioning for every change
Keywords: PII, sensitive data, S3, discover Answer: Macie Why: ML-based PII discovery specifically for S3
Keywords: unusual API, suspicious activity, threat, compromise Answer: GuardDuty Why: Analyzes CloudTrail for anomalous API patterns
Keywords: SQL injection, XSS, Layer 7, web exploits Answer: WAF Why: WAF has managed rules for common web attacks
Keywords: DDoS, response team, SRT, cost protection Answer: Shield Advanced Why: Shield Advanced includes DDoS Response Team access
Keywords: rotate, 6 months, 90 days, custom period Answer: Manual rotation with Key Alias Why: Auto-rotation is fixed at 1 year; manual rotation for custom periods
| Service | Detects | Scans | Output |
|---|---|---|---|
| GuardDuty | Threats, attacks | CloudTrail, VPC Flow, DNS | EventBridge |
| Inspector | Vulnerabilities | EC2 OS, ECR, Lambda | Security Hub |
| Macie | Sensitive data | S3 buckets | EventBridge |
| Config | Non-compliance | Resource configs | SNS, S3 |
| Requirement | Service |
|---|---|
| DB credentials + auto rotation | Secrets Manager |
| Config values + versioning | SSM Parameter Store |
| Hierarchical config paths | SSM Parameter Store |
| Multi-region secrets | Secrets Manager |
| Free tier needed | SSM Parameter Store |
| Key Type | Auto-Rotation | Period | Manual Rotation |
|---|---|---|---|
| AWS Managed | Always ON | 1 year | N/A |
| Customer Managed | Optional | 1 year | Via alias |
| Imported | ❌ Never | N/A | Via alias only |
| ✅ Works | ❌ Doesn’t Work |
|---|---|
| ALB | NLB |
| CloudFront | CLB |
| API Gateway | EC2 directly |
| AppSync | Route 53 |
| Cognito User Pool |
| Question Contains | → Instant Answer |
|---|---|
| “RDS” + “auto rotation” | Secrets Manager |
| “FIPS 140-2 Level 3” | CloudHSM |
| “AWS cannot access keys” | CloudHSM |
| “Multi-region” + “encryption” + “Global DB” | KMS Multi-Region Keys |
| “Edge-Optimized” + “certificate” | us-east-1 |
| “Regional API Gateway” + “certificate” | Same region as API |
| “CloudFront” + “certificate” | us-east-1 |
| “Centrally manage” + “accounts” | Firewall Manager |
| “Security Groups” + “Organization” | Firewall Manager |
| “WAF” + “multiple accounts” | Firewall Manager |
| “OS vulnerability” / “CVE” | Inspector |
| “Threat” / “unusual API” / “compromised” | GuardDuty |
| “PII” / “sensitive data” + “S3” | Macie |
| “Configuration compliance” | Config |
| “Fixed IP” + “WAF” | Global Accelerator + ALB + WAF |
| “NLB” + “WAF” | IMPOSSIBLE |
| “Rotate every 6 months” | Manual rotation with alias |
| “Asymmetric” + “encrypt AND sign” | Two separate keys needed |
| “Version history” + “config” | SSM Parameter Store |
| “Certificate expiry notification” | Imported = Config rule; ACM-created = EventBridge/CW |
| “DDoS” + “response team” | Shield Advanced |
| “DDoS” + “free” | Shield Standard |
| “SQL injection” / “XSS” | WAF |
| “Layer 7 protection” | WAF |
| “Layer 3/4 protection” | Shield, Security Groups, NACLs |
| “Cross-account encrypted AMI” | Customer Managed Key + Key Policy |
| “Admins cannot see data” | Client-Side Encryption |
| “Lambda Client-side Encryption” | WRONG ANSWER (Lambda isn’t storage) |
| “KMS IAM Policy” alone | NOT enough (Key Policy required) |
| “ACM” + “EC2 directly” | IMPOSSIBLE |
| “GuardDuty” + “CloudWatch Logs” | FALSE (not a data source) |
□ Does it mention "auto rotation" for DB?
→ Yes = Secrets Manager
→ No = continue
□ Does it mention "FIPS Level 3" or "AWS cannot access"?
→ Yes = CloudHSM
→ No = continue
□ Does it mention "multi-region" + "Global database"?
→ Yes = KMS Multi-Region Keys
→ No = continue
□ Does it mention "config values" or "version history"?
→ Yes = SSM Parameter Store
→ No = probably KMS□ Is it about DETECTION (finding problems)?
→ Threats/attacks = GuardDuty
→ Vulnerabilities/CVE = Inspector
→ Sensitive data = Macie
→ Configuration = Config
□ Is it about PROTECTION (blocking attacks)?
→ Layer 7 (HTTP) = WAF
→ Layer 3/4 (network) = Shield, SG, NACL
□ Is it about MANAGEMENT (centralize/aggregate)?
→ Across accounts = Firewall Manager
→ Aggregate findings = Security Hub□ Is CloudFront involved (directly or Edge-Optimized)?
→ Yes = us-east-1
→ No = same region as the serviceDaysToExpiryAWS Cloudformation is a declarative way of outlining and creating your AWS Infrastructure, for any resources, in the right order and with exact configuration that you specify.
CloudFormation Service Role:
cloudformation:* + iam:PassRole permissionsUser (cloudformation:*, iam:PassRole) ──► CloudFormation ──► Service Role (s3:*, ec2:*) ──► ResourcesAWS Infrastructure Composer (formerly Application Composer): visually design and build serverless applications quickly on AWS. Deploy AWS infrastructure code without needing to be an expert in AWS. Configure how your resources interact with each other. Ability to import existing CloudFormation / SAM templates to visualize them. Help to visualize, build, and deploy modern applications from all AWS services that are supported by AWS CloudFormation.
AWS Cloud Development Kit (CDK) accelerates cloud development using common programming languages to model your applications, to deplay infrastructure and applicationg runtime code together.
AWS Elastic Beanstalk is a managed service of Platform as a Service (PaaS), developer centric view of deploying an application on AWS (using EC2, ASG, ELB, RDS and etc). Instance configuration, OS handling, deployment strategy, capacity provisioning, load balancing and auto-scaling, application health-monitoring & responsiveness, everything except the actual application code is responsibility of AWS Elastic Beanstalk.
Elastic Beanstalk automatically handles capacity provisioning, load balancing, autoscaling and application health monitoring.
AWS CodeDeploy is a fully managed deployment service that automates software deployments to various compute services, such as Amazon Elastic Compute Cloud (EC2), Amazon Elastic Container Service (ECS), AWS Lambda, and your on-premises servers. Use CodeDeploy to automate software deployments, eliminating the need for error-prone manual operations.
AWS CodeCommit is a fully managed, scalable and highly available code repository, using Git technology. Collaborate with others on code. Code changes are automatically versioned.
AWS CodeBuild is a fully managed, serverless, scalable & highly availble code building service in the cloud. Compiles source code, run tests and produces packages that are ready to be deployed.
AWS CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. Compatible with CodeCommit, CodeBuild, CodeDeploy, Elastic Beanstalk, CloudFormation, GitHub, etc.
AWS CodeArtifact is a secure, scalable, and cost-effective package management for software development.
AWS CodeStar is a unified UI to easily manage software develompent activities in one place.
AWS Cloud9 is a cloud IDE (Intergrated Development Environment) for writing, running and debugging code.
AWS Step Functions: serverless visual workflow to archestrate Labmda functions. Sequence, parallel, conditions, timeouts, error handling, human approval feature etc. Integrates with EC2, ECS, On-premises servers, API Gatewat, SQS queues, etc.
AWS Step Functions excel in complex workflow orchestration scenarios, offering advanced features such as state management, error handling, and parallel execution.
AWS Amplify: a set of tools and services that helps you develop and deploy scalable full stack web and mobile applications. Authentication, Storage, API (REST, GraphQL), CI/CD, PubSub, Analytics, AI/ML Predictions, Monitoring, Source Code from AWS, GitHub, etc.
Amplify has serverless architecture simplifies maintenance and scales automatically. There is no need to provision or manage EC2 instances. Lambda and API Gateway handle availability and response to traffic spikes automatically. Upload code and let Amplify handle deploying and running it.
AWS Device Farm: fully-managed service that tests your web and mobile apps against desktop browsers, real mobile devices and tablets. Run tests concurrently on multiple devices. Ability to configure device ettings (GPS, language, WiFi, Bluetooth, etc).
AWS Systems Manager (SSM) — hybrid AWS service to manage infrastructure at scale (EC2 + on-premises servers). Requires SSM Agent installed on managed instances.
SSM Session Manager:
SSM Run Command:
SSM Patch Manager:
AWS-RunPatchBaseline documentSSM Maintenance Windows:
Maintenance Windows ── trigger (e.g., every 24h) ──► Run Command ──► EC2 (with SSM Agent)SSM Automation:
⚠️ Exam trap: “Secure shell access without SSH/port 22” → SSM Session Manager. “Run script on 100s of instances” → SSM Run Command. “Automate patching schedule” → SSM Patch Manager + Maintenance Windows.
AWS CI/CD chain: CodeCommit (source) → CodeBuild (build/test) → CodeDeploy (deploy) → orchestrated by CodePipeline. Each service is independent and can integrate with third-party tools.
Users don’t need permissions to underlying resources. They only need cloudformation:* + iam:PassRole. The Service Role has the resource permissions. This enables delegation without over-granting.
SSM Agent enables secure management of EC2 + on-prem servers. Five key features map to five different exam patterns:
CodeDeploy works on EC2, ECS, Lambda, AND on-premises — it’s the deployment tool that bridges cloud and on-prem.
How do you want to define infrastructure?
│
├─ "YAML/JSON templates, full control" → CloudFormation
├─ "Programming language (Python/TS)" → CDK
├─ "Just upload my code, handle the rest" → Elastic Beanstalk
├─ "Full-stack web/mobile app" → Amplify
└─ "Visual drag-and-drop designer" → Infrastructure ComposerWhat CI/CD step do you need?
│
├─ SOURCE (store code) → CodeCommit (or GitHub)
├─ BUILD (compile/test) → CodeBuild
├─ DEPLOY (push to infra) → CodeDeploy
├─ ORCHESTRATE (chain all) → CodePipeline
├─ PACKAGE MGMT → CodeArtifact
└─ UNIFIED UI → CodeStarWhat do you need to do on EC2/on-prem?
│
├─ "Shell access without SSH" → Session Manager
├─ "Run script on many instances" → Run Command
├─ "Automate OS patching" → Patch Manager
├─ "Schedule maintenance tasks" → Maintenance Windows
├─ "Complex task automation / Config fix" → Automation (Runbooks)Keywords: IaC, template, declarative, all resources Answer: CloudFormation Why: Native AWS IaC, supports almost all resources, custom resources for unsupported.
Keywords: least privilege, deploy stacks, iam:PassRole Answer: CloudFormation Service Role Why: Service Role has resource permissions; user only needs cloudformation:* + iam:PassRole.
Keywords: PaaS, developer-centric, auto-scaling, health monitoring Answer: Elastic Beanstalk Why: Handles capacity, load balancing, scaling, monitoring. You only write code.
Keywords: programming language, CDK, familiar syntax Answer: AWS CDK Why: CDK compiles to CloudFormation. Use familiar languages instead of YAML/JSON.
Keywords: CI/CD, pipeline, automate releases Answer: CodePipeline (orchestrates CodeCommit + CodeBuild + CodeDeploy)
Keywords: deploy, multi-target, on-premises Answer: CodeDeploy Why: Only AWS deployment service that supports both cloud and on-prem.
Keywords: no SSH, no bastion, no port 22, secure shell Answer: SSM Session Manager Why: Uses SSM Agent + IAM permissions. Logs to S3/CloudWatch.
Keywords: fleet, multiple instances, run command, no SSH Answer: SSM Run Command
Keywords: patch, schedule, compliance, OS updates Answer: SSM Patch Manager + Maintenance Windows
Keywords: Config, remediate, auto-fix, non-compliant Answer: AWS Config + SSM Automation (Runbooks)
Keywords: web app, mobile app, full-stack, serverless, Amplify Answer: AWS Amplify
Keywords: mobile testing, real devices, browsers Answer: AWS Device Farm
| Service | What It Does | Abstraction |
|---|---|---|
| CloudFormation | IaC templates → AWS resources | Low (full control) |
| CDK | Code → CloudFormation templates | Medium |
| Beanstalk | Upload code → full environment | High (PaaS) |
| Amplify | Full-stack web/mobile framework | High (serverless) |
| Infrastructure Composer | Visual CloudFormation designer | Visual |
| Service | Role | Serverless? |
|---|---|---|
| CodeCommit | Source repository (Git) | ✅ |
| CodeBuild | Build + test | ✅ |
| CodeDeploy | Deploy to EC2/ECS/Lambda/on-prem | ✅ |
| CodePipeline | Orchestrate pipeline | ✅ |
| CodeArtifact | Package management | ✅ |
| Feature | Purpose | Key Differentiator |
|---|---|---|
| Session Manager | Secure shell | No SSH/port 22, IAM-based |
| Run Command | Execute scripts on fleet | No SSH, EventBridge trigger |
| Patch Manager | Automate patching | Compliance reports |
| Maintenance Windows | Schedule operations | Schedule + duration + tasks |
| Automation | Complex task runbooks | Config remediation trigger |
| Question Contains | → Instant Answer |
|---|---|
| “IaC, declarative templates” | CloudFormation |
| “Custom resources for unsupported” | CloudFormation |
| “Service Role, iam:PassRole” | CloudFormation Service Role |
| “Visual designer for CloudFormation” | Infrastructure Composer |
| “Define infra in Python/TypeScript” | CDK |
| “PaaS, upload code, handles rest” | Elastic Beanstalk |
| “Full-stack web/mobile serverless” | Amplify |
| “Automate CI/CD pipeline” | CodePipeline |
| “Source code repository (Git)” | CodeCommit |
| “Build and test code” | CodeBuild |
| “Deploy to EC2/ECS/Lambda/on-prem” | CodeDeploy |
| “Package management” | CodeArtifact |
| “No SSH, no port 22, secure shell” | SSM Session Manager |
| “Run script on fleet of instances” | SSM Run Command |
| “Automate OS patching” | SSM Patch Manager |
| “Schedule maintenance tasks” | SSM Maintenance Windows |
| “Config auto-remediation” | SSM Automation + Config |
| “Workflow orchestration, state machine” | Step Functions |
| “Test on real mobile devices” | Device Farm |
Amazon Rekognition automates image recognition and video analysis for your applications without machine learning (ML) experience.
Rekognition Content Moderation:
⚠️ Exam trap: “Moderate user-uploaded images” or “detect inappropriate content” → Rekognition Content Moderation + optionally A2I for human review.
Amazon Transcribe automatically convert speech to text, using deep learning process called automatic speech recognition (ASR).
⚠️ Exam trap: “Remove PII from audio/transcripts” → Amazon Transcribe with PII Redaction enabled.
Amazon Polly turn text into lifelike speech using deep learning. Allows to create applications that talk.
Amazon Translate natural and accurate language translation.
Amazon Lex (technology that powers Alexa) easily add AI that understands intent, maintains context, and automates simple tasks across many languages to build chatbots, call center bots.
Amazon Connect is an omnichannel cloud contact center that helps companies provide superior customer service at a lower cost. Amazon Connect provides a seamless experience across voice and chat for customers and agents.
Amazon Comprehend fully managed and serverless service for natural language processing (NLP), that uses machine learning to find insights and relationships in text: define language of the text, extract key phrases, understands emotions in the text, etc. Create and group articles by topics that Comprehend will uncover.
Amazon Comprehend Medical:
DetectPHI APIAmazon SageMaker is a fully managed service for developers / data scientists to build ML models.
Amazon Forecast is a fully managed service that uses ML to deliver highly accurate forecasts (product demand planning, financial planning, resource planning, etc).
Amazon Kendra is a fully managed document search service powered by ML.
Kendra Architecture:
Data Sources ──► indexing ──► Knowledge Index ──► "Where is IT support?" ──► "1st floor"
(S3, RDS, etc) (powered by ML) (natural language) (answer)Amazon Personalize is a fully managed ML-service to build apps with real-time personalized recommendations.
Personalize Architecture:
S3 (historical data) ─────────┐
├──► Amazon Personalize ──► Websites, Mobile Apps, SMS, Emails
Personalize API (real-time) ──┘ (customized API)Amazon Textract automatically extracts text, handwriting and data from any scanned documents using AI and ML.
Textract Flow:
Document (ID, form, etc) ──► analyze ──► Amazon Textract ──► Structured JSON
{"Document ID": "123",
"Name": "...",
"DOB": "23.05.1997"}Lex + Connect Integration (Call Center Pattern):
Phone Call ──► Connect ──► stream ──► Lex ──► invoke ──► Lambda ──► CRM
(schedule (contact (audio) (intent (action) (database)
appointment) center) recognized)| Service | Purpose | Key Feature |
|---|---|---|
| Rekognition | Image/video analysis | Face detection, content moderation |
| Transcribe | Speech → Text | PII redaction, multi-language |
| Polly | Text → Speech | Lexicons, SSML |
| Translate | Language translation | Localization |
| Lex | Chatbots | ASR + NLU (powers Alexa) |
| Connect | Contact center | 80% cheaper, cloud-based |
| Comprehend | NLP text analysis | Sentiment, topics, entities |
| Comprehend Medical | Clinical text NLP | PHI detection |
| SageMaker | Build custom ML models | Full ML workflow |
| Forecast | Time-series predictions | Demand/resource planning |
| Kendra | Document search | Natural language, incremental learning |
| Personalize | Recommendations | Same as Amazon.com |
| Textract | Document data extraction | Forms, tables, handwriting |
| Bedrock | Generative AI (Foundation Models) | Claude, Llama, Titan, Stable Diffusion |
Amazon Bedrock is a fully managed service for building generative AI applications using foundation models (FMs).
⚠️ Exam trap: “Generative AI” or “Foundation Models” or “LLM on AWS” → Amazon Bedrock. SageMaker = build your own ML models.
Amazon Augmented AI (A2I) provides human review workflows for ML predictions.
⚠️ Exam trap: “Human review of ML predictions” or “manual review when confidence low” → Amazon A2I
What do you need to do?
│
┌────┴────┬─────────┬──────────┬──────────┬────────────┐
▼ ▼ ▼ ▼ ▼ ▼
VISION SPEECH TEXT/NLP SEARCH PREDICT GEN AI
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
Rekognition ┌─┴─┐ ┌──┴──┐ ┌──┴──┐ ┌──┴───┐ Bedrock
│ │ │ │ │ │ │ │
Speech→ Text→ Comprehend Kendra Forecast (Foundation
Text Speech (NLP) (docs) (time) Models)
│ │ │ │
▼ ▼ ▼ ▼
Transcribe Polly Medical? Personalize
│ (recommend)
▼
Comprehend
MedicalWhen to use which search service?
Search Type?
│
├── Document Q&A ("Where is IT support?") ──► Kendra
│ (natural language answers)
│
└── Full-text search (logs, partial match) ──► OpenSearch
(search engine, analytics)When to use which text extraction?
Extract from documents?
│
├── Forms, tables, structured data ──► Textract
│ (invoices, IDs, medical records)
│
└── Text in images/videos ──► Rekognition
(signs, banners, license plates)Custom ML vs Managed Services?
Need ML capability?
│
├── Pre-built solution exists? ──► Use managed service
│ (Rekognition, Comprehend, Forecast, etc.)
│
└── Need custom model? ──► SageMaker
(your own algorithms, data)⚠️ Exam trap: Kendra vs OpenSearch:
⚠️ Exam trap: Textract vs Rekognition text:
⚠️ Exam trap: SageMaker vs Managed Services:
⚠️ Exam trap: Bedrock vs SageMaker:
⚠️ Exam trap: Comprehend vs Comprehend Medical:
AWS offers two paths:
Rule: If a managed service exists for your use case → use it. Custom ML only when needed.
| Purpose | Service |
|---|---|
| Image/Video analysis | Rekognition |
| Speech → Text | Transcribe |
| Text → Speech | Polly |
| Translation | Translate |
| Chatbots | Lex |
| Contact Center | Connect |
| Text NLP | Comprehend |
| Document Q&A | Kendra |
| Recommendations | Personalize |
| Document extraction | Textract |
| Time-series forecast | Forecast |
| Custom ML | SageMaker |
| Generative AI | Bedrock |
Common AWS ML patterns:
When ML confidence is low, route to human review:
| Question Contains | → Instant Answer |
|---|---|
| “image recognition” | Rekognition |
| “video analysis” | Rekognition |
| “face detection” | Rekognition |
| “content moderation” + images | Rekognition |
| “celebrity recognition” | Rekognition |
| “speech to text” | Transcribe |
| “transcribe calls” | Transcribe |
| “remove PII from audio” | Transcribe (Redaction) |
| “closed captioning” | Transcribe |
| “text to speech” | Polly |
| “applications that talk” | Polly |
| “translate languages” | Translate |
| “localize content” | Translate |
| “chatbot” | Lex |
| “conversational bot” | Lex |
| “powers Alexa” | Lex |
| “call center” | Connect |
| “contact center” | Connect |
| “80% cheaper contact” | Connect |
| “sentiment analysis” | Comprehend |
| “NLP” + “text insights” | Comprehend |
| “clinical text” + “PHI” | Comprehend Medical |
| “physician notes” | Comprehend Medical |
| “document search” + “Q&A” | Kendra |
| “natural language search” | Kendra |
| “incremental learning” | Kendra |
| “product recommendations” | Personalize |
| “same as Amazon.com” | Personalize |
| “personalized marketing” | Personalize |
| “extract from forms/tables” | Textract |
| “invoice processing” | Textract |
| “ID documents” | Textract |
| “handwriting extraction” | Textract |
| “demand forecasting” | Forecast |
| “time-series prediction” | Forecast |
| “build custom ML model” | SageMaker |
| “train ML model” | SageMaker |
| “generative AI” | Bedrock |
| “foundation models” | Bedrock |
| “LLM on AWS” | Bedrock |
| “Claude/Llama/Titan” | Bedrock |
| “human review ML” | A2I |
| “manual review when low confidence” | A2I |
| Confusion | Clarification |
|---|---|
| Kendra vs OpenSearch | Kendra = document Q&A; OpenSearch = full-text search/logs |
| Textract vs Rekognition | Textract = forms/tables; Rekognition = text in images |
| SageMaker vs Bedrock | SageMaker = custom models; Bedrock = use foundation models |
| Comprehend vs Medical | Comprehend = general; Medical = clinical/PHI |
| Polly vs Transcribe | Polly = text→speech; Transcribe = speech→text |
| Lex vs Connect | Lex = chatbot logic; Connect = phone/contact center |
Amazon WorkSpaces: managed Desktop as a Service (DaaS) solution to easily provision Windows or Linux desktops. Cloud alternative to managing of on-premise Virtual Desktop Infrastructure (VDI). Scalable to thousands. Integrates with KMS. Pay-as-you-go pricing.
Amazon AppStream 2.0: desktop application streaming service. The application is delivered from within a web browser. Can be configured instance type per application type (CPU, RAM, GPU).
AWS IoT Core: serverless, secure & scalable to billions messages, service that allows easily connect IoT devices to AWS Cloud.
AWS AppSync: store and sync data across mobile and web apps in real-time. Makes use of GraphQL (mobile technology from Facebook). Intergrations with DynamoDB / Lambda.
AWS Ground Station: is a fully managed service that lets you ontrol satellite communications, process data and scale your satellite operations (weather forecasting, surface imaging, videobroadcasting, etc). Provides global network of satellite ground stations nea AWS regions. Allows to download satellite data to AWS VPC within seconds and send it to S3 or EC2 Instances.
Amazon Pinpoint: scalable two-way (outbound/inbound) marketing communications service. Supports email, SMS, push, voice and in-app messaging. Ability to segment and personalize messages with right content to customers. Possibility to receive replies. Scales to billions of messages per day. Use cases: run campaigns by sending marketing, bulk, transactional SMS messages. Stream events (TEXT_SUCCESS, TEXT_DELIVERED) → SNS, Kinesis Data Firehose, CloudWatch Logs. Versus Amazon SNS or Amazon SES: In SNS & SES you manage each message’s audience, content, and delivery schedule. In Pinpoint, you create message templates, delivery schedules, highly-targeted segments, and full campaigns.
Amazon Simple Email Service (SES): fully managed service to send emails securely, globally, and at scale. Allows inbound/outbound emails. Reputation dashboard, performance insights, anti-spam feedback. Statistics: email deliveries, bounces, feedback loop results, email open rates. Supports DKIM and SPF. Flexible IP deployment: shared, dedicated, customer-owned. Send via AWS Console, APIs, or SMTP. Use cases: transactional, marketing, and bulk email communications.
Amazon AppFlow: fully managed integration service to securely transfer data between SaaS applications and AWS. Sources: Salesforce, SAP, Zendesk, Slack, ServiceNow. Destinations: S3, Redshift, or non-AWS (Snowflake, Salesforce). Frequency: schedule, event-driven, or on-demand. Data transformation: filtering and validation. Encrypted over public internet or privately over AWS PrivateLink.
Instance Scheduler on AWS: AWS solution (deployed via CloudFormation, not a service) to automatically start/stop AWS services to reduce costs (up to 70%).
AWS Marketplace digital catalog with thousands of software listings from independent software vendors (third-party).
AWS Data Exchange: find, subscribe to, and use third-party data in the cloud. Data providers publish data products → subscribers consume via S3, API, or Lake Formation. Use cases: financial data, weather, healthcare. No need to build custom ETL pipelines for external data.
AWS Data Pipeline: managed ETL service to process and move data between AWS compute and storage services, and on-premises sources. Defines data-driven workflows (dependencies). Runs on EC2 or EMR. Retries on failure. Legacy — prefer AWS Glue or Step Functions for new workloads.
⚠️ Exam trap: “Data Pipeline” on exam is usually legacy — modern answer is Glue (serverless ETL) or Step Functions (orchestration). But know Data Pipeline exists.
AWS Proton: fully managed delivery service for container and serverless applications. Platform teams create templates → developers deploy using self-service. Manages infrastructure provisioning + CI/CD. Think “Service Catalog for containers/serverless.”
AWS Wavelength: deploy AWS compute/storage at 5G telecom edge locations. Ultra-low latency for mobile devices. Extends VPC to Wavelength Zones. Use cases: real-time gaming, ML inference at edge, AR/VR, connected vehicles.
Amazon ECS Anywhere / EKS Anywhere: run ECS or EKS on on-premises or customer-managed infrastructure.
⚠️ Exam trap: “Run containers on-premises but manage from AWS” → ECS/EKS Anywhere. “Fully self-managed Kubernetes, same as EKS” → EKS Distro.
Amazon Elastic Transcoder: transcode media files (video/audio) stored in S3 into formats needed by consumer devices (phones, tablets, PCs). Pay per transcoding minute. Being replaced by AWS Elemental MediaConvert (more features, same purpose).
AWS License Manager: manage software licenses from vendors (Microsoft, SAP, Oracle). Track license usage, set rules, enforce limits. Integrates with EC2, RDS. Prevent license violations. Shared via AWS RAM across accounts.
Amazon Managed Grafana: fully managed Grafana for operational dashboards and observability. Queries from CloudWatch, Prometheus, X-Ray, Elasticsearch, Timestream. Workspace-based, integrates with IAM Identity Center for access.
Amazon Managed Service for Prometheus: fully managed, serverless Prometheus-compatible monitoring for containers (EKS, ECS). Stores metrics at scale. Query with PromQL. Pairs with Managed Grafana for visualization.
⚠️ Exam trap: “Container monitoring with Prometheus” → Managed Prometheus (metrics) + Managed Grafana (dashboards). NOT CloudWatch Container Insights (different approach).
AWS Audit Manager: continuously audit AWS usage to assess risk and compliance. Maps to frameworks (GDPR, HIPAA, SOC 2, PCI DSS). Collects evidence automatically from CloudTrail, Config, Security Hub. Generates audit-ready reports.
⚠️ Exam trap: “Continuous compliance auditing with evidence collection” → Audit Manager. “Compliance documents/agreements” → AWS Artifact. Different purposes.
Amazon Fraud Detector: fully managed service to identify potentially fraudulent online activities (online payment fraud, fake account creation, etc). Uses ML models trained on your data + Amazon’s fraud detection expertise. No ML experience needed.
AWS Serverless Application Repository: managed repository to deploy and publish serverless applications. Find pre-built Lambda functions and SAM templates. Supports public and private sharing.
Amazon Kinesis Video Streams: securely stream video from devices to AWS for analytics, ML, playback. Use cases: smart home cameras, industrial monitoring, computer vision with Rekognition.
The key distinction is WHO manages the campaign logic:
Deployed via CloudFormation. Uses DynamoDB + Lambda + tags. Supports cross-account/cross-region. Key for cost optimization questions.
| Question Contains | → Instant Answer |
|---|---|
| “managed virtual desktop” | WorkSpaces |
| “DaaS, VDI replacement” | WorkSpaces |
| “stream desktop application via browser” | AppStream 2.0 |
| “IoT devices to cloud” | IoT Core |
| “GraphQL, real-time sync” | AppSync |
| “satellite communications” | Ground Station |
| “marketing campaigns, segments” | Pinpoint |
| “two-way SMS/email campaigns” | Pinpoint |
| “transactional email, DKIM, SPF” | SES |
| “bulk email at scale” | SES |
| “transfer data from Salesforce/SAP to S3” | AppFlow |
| “SaaS integration” | AppFlow |
| “PrivateLink for SaaS data transfer” | AppFlow |
| “stop/start EC2 to save costs” | Instance Scheduler |
| “schedule EC2 on/off business hours” | Instance Scheduler |
| “third-party software catalog” | Marketplace |
| “subscribe to third-party data” | Data Exchange |
| “ETL pipeline, legacy orchestration” | Data Pipeline (prefer Glue/Step Functions) |
| “platform team templates for containers” | Proton |
| “5G edge, ultra-low latency mobile” | Wavelength |
| “run ECS/EKS on-premises” | ECS/EKS Anywhere |
| “self-managed Kubernetes like EKS” | EKS Distro |
| “transcode video in S3” | Elastic Transcoder / MediaConvert |
| “manage software licenses” | License Manager |
| “Grafana dashboards, observability” | Managed Grafana |
| “Prometheus container metrics” | Managed Prometheus |
| “continuous compliance audit” | Audit Manager |
| “detect online fraud with ML” | Fraud Detector |
| “pre-built Lambda/SAM templates” | Serverless App Repository |
| “stream video from devices” | Kinesis Video Streams |
| Confusion | Clarification |
|---|---|
| WorkSpaces vs AppStream | WorkSpaces = full desktop; AppStream = one app in browser |
| Pinpoint vs SNS | Pinpoint = campaigns/segments/templates; SNS = per-message notifications |
| Pinpoint vs SES | Pinpoint = marketing campaigns; SES = transactional/bulk email |
| AppFlow vs Glue | AppFlow = SaaS sources; Glue = AWS data sources (S3, RDS, etc.) |
| AppSync vs API Gateway | AppSync = GraphQL + real-time; API Gateway = REST/HTTP/WebSocket |
| Instance Scheduler vs ASG | Scheduler = stop/start on schedule; ASG = scale based on demand |
| Data Pipeline vs Glue | Data Pipeline = legacy ETL (EC2/EMR); Glue = modern serverless ETL |
| Audit Manager vs Artifact | Audit Manager = continuous audit with evidence; Artifact = download compliance docs |
| Proton vs Service Catalog | Proton = container/serverless templates; Service Catalog = any CloudFormation product |
| Managed Prometheus vs CloudWatch | Prometheus = PromQL, container-native; CloudWatch = AWS-native metrics |
| EKS Anywhere vs EKS Distro | Anywhere = AWS-managed on your infra; Distro = fully self-managed |
| Elastic Transcoder vs MediaConvert | Transcoder = legacy; MediaConvert = modern replacement (more features) |
AWS Backup: fully-managed service to centrally manage and automate backups across AWS services. On-demand and scheduled backups. Supports PITR (Point-in-time Recovery). Retention Periods, Lifecycle Management, Backup Policies. Cross-Region Backup. Cross-Account backup (using AWS Organization).
Supported services: EC2, EBS, S3, RDS (all engines), Aurora, DynamoDB, DocumentDB, Neptune, EFS, FSx (Lustre & Windows), Storage Gateway (Volume Gateway)
Backup Plans:
AWS Backup Vault Lock:
⚠️ Exam trap: “Prevent anyone including root from deleting backups” → Backup Vault Lock (WORM). Similar to S3 Object Lock but for AWS Backup.
AWS DataSync: move large amount of data from on-premises to AWS (or between AWS storage services).
⚠️ Exam trap: DataSync = data movement/sync (on-prem ↔ AWS, AWS ↔ AWS). AWS Backup = backup automation across AWS services. DataSync moves files; Backup creates snapshots/backups.
AWS Elastic Disaster Recovery (DRS): quickly and easily recover physical, virtual, and cloud-based servers into AWS.
⚠️ Exam trap: “Continuous replication of servers for DR” → DRS (Elastic Disaster Recovery). “Lift-and-shift migration” → MGN. Both use agents + continuous replication, but DRS = DR (failover/failback), MGN = one-time migration.
AWS Fault Injection Simulator (FIS) — fully managed service for Chaos Engineering on AWS workloads.
⚠️ Exam trap: “Test resilience by randomly terminating instances” → FIS. “Netflix Simian Army” → inspiration for FIS but not an AWS service.
Disaster = any event that negatively impacts business continuity or finances. Disaster Recovery (DR) = preparing for and recovering from a disaster.
DR scenarios:
RPO (Recovery Point Objective) — how much data loss you can tolerate (time between last backup and disaster). RTO (Recovery Time Objective) — how much downtime you can tolerate (time between disaster and recovery).
◄─── Data loss ───►◄─── Downtime ──►
│
● ⚡ ●
RPO Disaster RTO
(last backup) (back online)⚠️ Exam trap: RPO = data loss (backward-looking). RTO = downtime (forward-looking). Don’t confuse them — “minimize data loss” → optimize RPO. “Minimize downtime” → optimize RTO.
Four strategies, ordered from slowest/cheapest to fastest/most expensive:
Slower RTO ◄─────────────────────────────────────► Faster RTO
Cheaper Expensive
┌──────────┬──────────┬──────────┬──────────┐
│ Backup & │ Pilot │ Warm │ Multi │
│ Restore │ Light │ Standby │ Site │
└──────────┴──────────┴──────────┴──────────┘
Hours 10s min Minutes Seconds1. Backup & Restore (High RPO/RTO, cheapest)
On-prem ──► Storage Gateway / Snowball ──► S3 ──► Glacier (lifecycle)
AWS: EBS / RDS / Redshift ──► Scheduled Snapshots
Recovery: Snapshots ──► AMI ──► EC2 + RDS restore2. Pilot Light (Faster than backup)
On-prem (active) AWS Cloud
┌────────────┐ ┌────────────────────┐
│ App Server │ │ EC2 (NOT running) │
│ Primary DB │──repl──► │ RDS (running) │
└────────────┘ └────────────────────┘
Route 53 (failover)3. Warm Standby (Minutes RTO)
On-prem (active) AWS Cloud
┌────────────┐ ┌──────────────────────┐
│ App Server │ │ EC2 ASG (minimum) │
│ Primary DB │──repl──► │ RDS Secondary │
└────────────┘ └──────────────────────┘
Route 53 → scale up on failover4. Multi Site / Hot Site (Seconds RTO, most expensive)
On-prem (active) AWS Cloud (active)
┌────────────┐ ┌──────────────────────┐
│ App Server │◄──R53──► │ ELB → EC2 ASG (full) │
│ Primary DB │──repl──► │ RDS Secondary │
└────────────┘ └──────────────────────┘
Route 53 active-active (or Aurora Global)All AWS Multi Region = same as Multi Site but both sides are AWS:
Comparison Table:
| Strategy | RTO | RPO | Cost | What’s Running in AWS |
|---|---|---|---|---|
| Backup & Restore | Hours | High | 💰 | Nothing (just backups in S3) |
| Pilot Light | 10s of min | Medium | 💰💰 | DB only (EC2 stopped) |
| Warm Standby | Minutes | Low | 💰💰💰 | Everything at minimum size |
| Multi Site / Hot Site | Seconds | Very low | 💰💰💰💰 | Everything at full production |
⚠️ Exam trap: Pilot Light vs Warm Standby — both have DB replicating. The difference: Pilot Light has EC2 stopped (need to start), Warm Standby has EC2 running at minimum (need to scale up).
⚠️ Exam trap: “Cheapest DR” → Backup & Restore. “Lowest RTO/RPO” → Multi Site. “Balance cost and recovery” → Warm Standby.
⚠️ Exam trap: “Critical infrastructure up and running” = Pilot Light (only critical = DB). “Everything running at minimum” = Warm Standby. “Nothing running” = Backup & Restore. Key word is “critical” → Pilot Light.
AWS DMS — quickly and securely migrate databases to AWS.
Migration types:
Homogeneous: Source DB ──► EC2 (DMS) ──► Target DB (same engine)
Heterogeneous: Source DB ──► SCT (schema) + DMS (data) ──► Target DB (different engine)Continuous Replication (CDC):
Corporate DC AWS Cloud (VPC)
┌──────────────┐ ┌─────────────────────────────┐
│ Oracle DB │── data migration ─►│ DMS Replication Instance │
│ (source) │ │ (Full load + CDC) │
│ │ │ Public Subnet │
│ Server with │ │ │ │
│ AWS SCT │── schema convert ─►│ ▼ │
│ │ │ RDS MySQL (target) │
└──────────────┘ │ Private Subnet │
└─────────────────────────────┘DMS Sources: On-prem DBs (Oracle, SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, SAP, DB2), Azure SQL, RDS (all incl. Aurora), S3, DocumentDB DMS Targets: On-prem DBs, RDS, Redshift, DynamoDB, S3, OpenSearch, Kinesis Data Streams, Apache Kafka, DocumentDB, Neptune, Redis, Babelfish
AWS SCT (Schema Conversion Tool):
⚠️ Exam trap: “Migrate database with minimal downtime, source stays available” → DMS. “Different DB engines” → DMS + SCT. “Same engine, different platform” (e.g., on-prem PostgreSQL → RDS PostgreSQL) → DMS only, no SCT needed.
⚠️ Exam trap: SCT converts schema, DMS migrates data — never reversed. Heterogeneous migration order: SCT first (convert schema) → DMS second (move data into converted schema).
RDS MySQL → Aurora MySQL:
External MySQL → Aurora MySQL:
⚠️ Exam trap: RDS → Aurora = snapshot (native, cheapest, simplest). S3 dump path is for external/on-prem MySQL → Aurora. If both source and target are inside AWS, snapshot is always the best and most cost-effective option.
RDS PostgreSQL → Aurora PostgreSQL:
External PostgreSQL → Aurora PostgreSQL:
Both databases running? → Use DMS for continuous replication
| Strategy | Description | AWS Service | Example |
|---|---|---|---|
| Retire | Turn off what you don’t need | — | Kill legacy apps (save up to 20%) |
| Retain | Keep on-prem for now | — | Compliance, unresolved dependencies |
| Relocate | Move to cloud version as-is | VMware Cloud on AWS | VMware SDDC → VMware Cloud on AWS |
| Rehosting (Lift & Shift) | Move as-is to AWS, no optimizations | MGN | VM → EC2 (save ~30%) |
| Replatforming (Lift & Reshape) | Minor cloud optimizations, no core changes | DMS, Beanstalk | MySQL → RDS MySQL |
| Repurchasing (Drop & Shop) | Switch to SaaS product | — | CRM → Salesforce, HR → Workday |
| Refactoring (Re-architect) | Rebuild cloud-native | Lambda, DynamoDB | Monolith → microservices |
⚠️ Exam trap: “Lift-and-shift” = Rehosting (MGN). “Move to RDS without code changes” = Replatforming. “Rewrite as serverless” = Refactoring. “Move VMware SDDC to VMware Cloud on AWS” = Relocate. Don’t confuse Rehosting with Replatforming — rehosting changes nothing, replatforming makes small optimizations.
⚠️ Exam trap: The course says 7 R’s (includes Relocate). Some sources say 6 R’s (no Relocate). Know both — exam may reference either count.
AWS MGN — the “AWS evolution” of CloudEndure Migration, replacing AWS Server Migration Service (SMS).
Corporate DC / Any Cloud AWS Cloud
┌──────────────────────┐ ┌───────────────────────────────┐
│ OS ┐ │ │ Staging Production │
│ Apps ├─► Replication │── continuous ──► │ Low-cost EC2 → Target EC2 │
│ DB │ Agent │ replication │ & EBS volumes & EBS vols │
│ Disks ┘ │ │ (cutover) ──► │
└──────────────────────┘ └───────────────────────────────┘⚠️ Exam trap: “Lift-and-shift to AWS, minimal downtime” → AWS MGN (Application Migration Service). NOT DMS (that’s for databases only). NOT SMS (deprecated, replaced by MGN).
VMware Cloud on AWS — extend VMware-based on-prem data centers to AWS while keeping VMware Cloud software.
Example: 200 TB, 100 Mbps internet connection:
| Method | Setup Time | Transfer Time | Notes |
|---|---|---|---|
| Internet / VPN | Immediate | ~185 days | 200TB × 8 / 100 Mbps |
| Direct Connect 1 Gbps | >1 month | ~18.5 days | Faster but long setup |
| Snowball | ~1 week | ~1 week | End-to-end, can combine with DMS |
For ongoing replication: Site-to-Site VPN or DX with DMS or DataSync
⚠️ Exam trap: “Transfer 200 TB quickly” → Snowball (~1 week). NOT internet (185 days). NOT DX (setup alone >1 month). For ongoing sync after initial transfer → DMS or DataSync.
Lower RPO/RTO = more money. RPO = how much data you can lose (backward). RTO = how much downtime you can accept (forward). Every DR strategy is a position on the cost ↔ speed spectrum. If the question says “regardless of cost” → Multi-Site. If “cheapest” → Backup & Restore.
The 4 strategies differ by how much infrastructure is pre-provisioned in the DR region:
Key insight: You don’t need to memorize RTO numbers. Just ask: “How much needs to start/scale on failover?” More startup = more time = higher RTO.
Each tool moves a specific thing. The exam tests whether you can pick the right one:
SCT converts schema (structure). DMS migrates data (content). They’re never reversed. If engines are the same → no SCT needed. If engines differ → SCT first, DMS second.
Derivation trick: Schema = blueprint of the house. Data = furniture. You must build the house (SCT) before moving furniture in (DMS).
Same engine (homogeneous) eliminates complexity everywhere:
Different engine (heterogeneous) always adds an extra step (SCT, conversion).
When both source and target are AWS services, use native AWS operations (snapshot, read replica promotion). S3 dump/import path is for external databases entering AWS. The exam tests this: “most cost-effective RDS → Aurora” = snapshot, NOT S3.
DMS is not serverless — it runs on a replication instance (EC2). You choose instance type. Multi-AZ deployment gives HA for the replication instance itself. The source DB stays available during migration (non-disruptive).
Three different things that sound similar:
Historical evolution: CloudEndure Migration + AWS SMS → AWS MGN. If the exam mentions CloudEndure or SMS, the modern answer is MGN. Similarly, CloudEndure Disaster Recovery → AWS DRS.
Physics: shipping a device is faster than transferring hundreds of TB over a wire. Rule of thumb: if transfer calculation shows weeks/months → Snowball wins. Direct Connect needs >1 month setup, so for urgent large transfers it’s too slow.
What does the question ask for?
│
├─ "Cheapest" / "lowest cost" / "budget"
│ └─► Backup & Restore
│
├─ "Critical infrastructure running" / "core running"
│ └─► Pilot Light
│
├─ "Everything running at minimum" / "scaled down"
│ └─► Warm Standby
│
├─ "Lowest RTO" / "fastest recovery" / "regardless of cost" / "active-active"
│ └─► Multi-Site
│
└─ "Balance cost and recovery"
└─► Warm StandbyWhat are you migrating?
│
├─ DATABASE
│ ├─ Same engine? → DMS only
│ ├─ Different engine? → SCT + DMS
│ ├─ RDS → Aurora (same family)? → Snapshot restore
│ └─ External DB → Aurora? → S3 import (Percona/mysqldump) or DMS
│
├─ SERVERS / VMs / APPLICATIONS
│ ├─ Migration (one-time move)? → MGN
│ └─ DR (ongoing failover)? → DRS
│
├─ FILES / DATA
│ ├─ On-prem ↔ AWS sync? → DataSync
│ └─ Bulk physical transfer? → Snowball
│
└─ BACKUP MANAGEMENT
└─ Centralized backup across services? → AWS Backup| You CANNOT… | Why |
|---|---|
| Use SCT for data migration | SCT only converts schema |
| Use DMS for schema conversion | DMS only moves data |
| Skip SCT for heterogeneous migration | Different engines need schema conversion |
| Use S3 dump for RDS → Aurora (cost-effectively) | Snapshot is native and free |
| Use MGN for database migration | MGN migrates servers, not databases |
| Delete Backup Vault Lock backups (even root) | WORM protection |
| Use DataSync without an agent (on-prem) | Agent required for on-prem source |
| Set up Direct Connect in < 1 month | Physical provisioning required |
Keywords: different engines, Oracle, Aurora Answer: SCT (convert schema) + DMS (migrate data) Why: Heterogeneous migration — Oracle ≠ PostgreSQL, so schema conversion required first.
Keywords: RDS → Aurora, same engine family, cost-effective Answer: Create snapshot from RDS → restore as Aurora Why: Native AWS operation, no intermediate storage cost. S3 path is for external databases.
Keywords: lift-and-shift, servers, minimal downtime Answer: AWS MGN (Application Migration Service) Why: MGN does continuous replication of servers → cutover with minimal downtime. NOT DMS (databases only).
Keywords: critical, running, DR Answer: Pilot Light Why: Only critical components (DB) always on. EC2 stopped until disaster. “Critical” is the keyword.
Keywords: lowest RTO, fastest, regardless of cost Answer: Multi-Site / Hot Site Why: Full production on both sides, active-active routing = seconds RTO.
Keywords: large data, TB, quickly Answer: AWS Snowball Why: Internet = months, DX = weeks + setup time. Snowball = ~1 week end-to-end.
Keywords: centrally manage, backups, multiple services Answer: AWS Backup Why: Only service that orchestrates backups across all these services. S3 Lifecycle only manages S3 objects.
Keywords: prevent deletion, root, immutable, compliance Answer: AWS Backup Vault Lock (WORM) Why: WORM = Write Once Read Many. Even root can’t delete. Similar to S3 Object Lock.
Keywords: same engine, different platform Answer: DMS only (no SCT needed) Why: Same engine = homogeneous migration. SCT only needed when engines differ.
Keywords: continuous, ongoing, replication, CDC Answer: DMS with CDC (Change Data Capture) Why: DMS supports continuous replication, not just one-time migration.
Keywords: discovery, planning, inventory, on-premises Answer: AWS Application Discovery Service → Migration Hub Why: Agentless (VM inventory) or Agent-based (processes, network). Results viewed in Migration Hub.
Keywords: DR, servers, continuous replication, failover/failback Answer: AWS DRS (Elastic Disaster Recovery) Why: DRS = ongoing DR with failover. MGN = one-time migration. Both use continuous replication but different purpose.
Keywords: files, on-prem, NFS, SMB, sync Answer: AWS DataSync Why: Agent-based, preserves file permissions, incremental sync. Not DMS (databases) or Snowball (physical).
Keywords: VMware, vSphere, hybrid, extend Answer: VMware Cloud on AWS Why: Runs vSphere/vSAN/NSX on dedicated AWS hardware. Keep VMware tools, access AWS services.
Keywords: chaos, fault injection, resilience, stress test Answer: AWS FIS (Fault Injection Simulator) Why: Managed chaos engineering — CPU stress, stop instances, API errors. Pre-built templates.
Keywords: business case, cost analysis, baseline, current state Answer: AWS Migration Evaluator Why: Agentless Collector discovers on-prem footprint → analyzes → builds data-driven migration plan. NOT Application Discovery Service (that discovers servers, not costs).
Keywords: track, central dashboard, migration status, MGN + DMS Answer: AWS Migration Hub (+ Orchestrator for enterprise app templates) Why: Central location aggregating status from MGN and DMS. Orchestrator has pre-built templates for SAP, SQL Server.
| Service | Migrates | Direction | Key Feature |
|---|---|---|---|
| DMS | Databases | Any direction | CDC, source stays available |
| SCT | DB Schema | N/A (conversion) | Heterogeneous engine conversion |
| MGN | Servers/VMs/Apps | To AWS | Lift-and-shift, replaces SMS |
| DRS | Servers (DR) | To AWS | Failover/failback, replaces CloudEndure DR |
| DataSync | Files/Data | On-prem ↔ AWS, AWS ↔ AWS | Agent-based, incremental |
| Snowball | Bulk data | Physical shipping | Large one-time transfers |
| AWS Backup | Backups | Within AWS | Centralized backup management |
| Migration Evaluator | Business case | Assessment | Data-driven cost analysis |
| Migration Hub | Tracking | Central dashboard | Tracks MGN + DMS progress |
| App Discovery | Server inventory | On-prem → AWS | Agentless or agent-based |
| Backup & Restore | Pilot Light | Warm Standby | Multi-Site | |
|---|---|---|---|---|
| RTO | Hours | 10s of min | Minutes | Seconds |
| Cost | 💰 | 💰💰 | 💰💰💰 | 💰💰💰💰 |
| DB | Snapshots only | Running | Running | Running |
| App servers | Nothing | Stopped | Min capacity | Full prod |
| Route 53 | Manual update | Failover | Failover | Active-active |
| On failover | Restore everything | Start EC2, scale | Scale up | Already active |
| From | To | Best Method |
|---|---|---|
| RDS MySQL | Aurora MySQL | Snapshot restore |
| RDS PostgreSQL | Aurora PostgreSQL | Snapshot restore |
| External MySQL | Aurora MySQL | Percona XtraBackup → S3 |
| External PostgreSQL | Aurora PostgreSQL | Backup → S3 → aws_s3 extension |
| Any DB (ongoing) | Any target | DMS with CDC |
| Different engine | Different engine | SCT + DMS |
| Legacy | Modern Replacement |
|---|---|
| CloudEndure Migration | AWS MGN |
| AWS SMS (Server Migration) | AWS MGN |
| CloudEndure Disaster Recovery | AWS DRS |
| Question Contains | → Instant Answer |
|---|---|
| “Lift-and-shift” / “rehost” | MGN |
| “Database migration” | DMS |
| “Different DB engines” / “heterogeneous” | SCT + DMS |
| “Same engine, different platform” | DMS only (no SCT) |
| “Schema conversion” | SCT |
| “RDS → Aurora, cost-effective” | Snapshot restore |
| “External MySQL → Aurora” | S3 (Percona XtraBackup) |
| “Continuous DB replication” / “CDC” | DMS |
| “DR, continuous block replication” | DRS |
| “Cheapest DR” | Backup & Restore |
| “Critical infrastructure running” | Pilot Light |
| “Everything running at minimum” | Warm Standby |
| “Lowest RTO, regardless of cost” | Multi-Site |
| “Active-active DR” | Multi-Site |
| “Centralized backup automation” | AWS Backup |
| “Prevent backup deletion by root” | Backup Vault Lock (WORM) |
| “Move files on-prem ↔ AWS” | DataSync |
| “Transfer 200 TB quickly” | Snowball |
| “Discover on-prem servers for migration” | Application Discovery Service |
| “Build business case for migration” | Migration Evaluator |
| “Migration planning and tracking” | Migration Hub |
| “Pre-built migration templates (SAP, SQL Server)” | Migration Hub Orchestrator |
| “Chaos engineering” / “fault injection” | FIS |
| “Extend VMware to AWS” / “Relocate VMware” | VMware Cloud on AWS |
| “Source DB stays available during migration” | DMS |
| “Migrate VMs to EC2” | VM Import/Export or MGN |
| “Replace CloudEndure Migration” | MGN |
| “Replace SMS” | MGN |
| “Replace CloudEndure DR” | DRS |
| “Backup across RDS, DynamoDB, EFS, EBS” | AWS Backup |
| “PITR (Point-in-time Recovery)” | AWS Backup |
| “Minimize data loss” | Optimize RPO |
| “Minimize downtime” | Optimize RTO |
| “Ongoing sync after initial transfer” | DataSync or DMS |
| “Replatform” / “minor optimizations” | DMS (e.g., MySQL → RDS MySQL) |
| “Refactor” / “re-architect” | Serverless / cloud-native rebuild |
□ Is cost the primary concern?
→ Yes = Backup & Restore
→ No = continue
□ Does it mention "critical" infrastructure running?
→ Yes = Pilot Light
→ No = continue
□ Does it say "everything running" at minimum/scaled down?
→ Yes = Warm Standby
→ No = continue
□ Does it say "fastest" / "lowest RTO" / "active-active"?
→ Yes = Multi-Site□ Are you migrating a DATABASE?
→ Yes: Same engine? → DMS only
→ Yes: Different engine? → SCT + DMS
→ Yes: RDS → Aurora (same family)? → Snapshot
→ No = continue
□ Are you migrating SERVERS / VMs / APPS?
→ For migration (one-time)? → MGN
→ For DR (ongoing failover)? → DRS
□ Are you moving FILES / DATA?
→ On-prem ↔ AWS sync? → DataSync
→ Bulk physical? → Snowball
□ Are you managing BACKUPS?
→ Across AWS services? → AWS Backup□ Are source and target DB engines DIFFERENT?
→ Yes = SCT + DMS
→ No (same engine) = DMS only, NO SCTThese cut across multiple MASTER SUMMARY sections — use when the question doesn’t clearly fit one topic.
What kind of data is moving?
│
├─► DATABASE
│ ├─ Same engine? → DMS only
│ ├─ Different engine? → SCT + DMS
│ ├─ RDS → Aurora (same family)? → Snapshot restore
│ └─ External MySQL → Aurora? → Percona XtraBackup → S3
│
├─► FILES / OBJECTS
│ ├─ Network OK (< 1 week)?
│ │ ├─ One-time / scheduled sync → DataSync
│ │ ├─ Ongoing hybrid access → Storage Gateway
│ │ └─ FTP/SFTP for external users → Transfer Family
│ └─ Network bad (> 1 week)?
│ ├─ < 14 TB → Snowcone
│ └─ > 14 TB → Snowball Edge
│
├─► SERVERS / VMs
│ ├─ Migrate to AWS (one-time) → MGN (lift-and-shift)
│ └─ DR failover/failback → DRS
│
└─► CROSS-REGION / CROSS-ACCOUNT within AWS
├─ S3 → S3 → S3 Replication (CRR/SRR)
├─ S3 → EFS/FSx → DataSync (no agent needed)
├─ RDS/Aurora → Read Replica → promote
├─ DynamoDB → Global Tables
└─ EBS → Snapshots → copy to target regionWhat needs to happen in real-time?
│
├─► STREAMING DATA (continuous, ordered)
│ ├─ Need ordering + replay? → Kinesis Data Streams
│ ├─ Need delivery to S3/Redshift/OpenSearch? → Kinesis Firehose
│ ├─ Need SQL on streams? → Kinesis Data Analytics
│ └─ Need Apache Kafka compatible? → Amazon MSK
│
├─► EVENT-DRIVEN (discrete events, react)
│ ├─ AWS service state change? → EventBridge
│ ├─ Metric threshold crossed? → CloudWatch Alarm
│ ├─ Message queue (decouple)? → SQS
│ ├─ Fan-out to many? → SNS (or SNS + SQS)
│ └─ Orchestrate steps? → Step Functions
│
├─► LOG PROCESSING
│ ├─ Real-time → CloudWatch Subscription Filters
│ ├─ Near real-time to S3 → Firehose
│ └─ Batch/archive → S3 Export (up to 12h delay)
│
└─► API / REQUEST PROCESSING
├─ Sync (immediate response) → Lambda + API Gateway
├─ Async (fire-and-forget) → Lambda + SQS/SNS
└─ Long-running → Step Functions / ECS tasksWhat kind of search/query?
│
├─► FULL-TEXT SEARCH (partial match, any field)
│ └─► OpenSearch
│ Pattern: DynamoDB (storage) + OpenSearch (search)
│
├─► STRUCTURED QUERIES (SQL)
│ ├─ On data in S3? → Athena (serverless, pay-per-query)
│ ├─ On data warehouse? → Redshift
│ ├─ On CloudTrail logs in S3? → Athena
│ └─ On relational data? → RDS / Aurora
│
├─► KEY-VALUE LOOKUP (by primary key)
│ └─► DynamoDB (single-digit ms)
│
├─► LOG SEARCH
│ ├─ CloudWatch Logs → Logs Insights
│ ├─ Custom logs at scale → OpenSearch
│ └─ VPC traffic → VPC Flow Logs + Athena
│
└─► WHO DID WHAT (audit)
└─► CloudTrail → S3 → AthenaWhat needs to be faster?
│
├─► CONTENT DELIVERY (static/dynamic to users)
│ ├─ Global users, cacheable → CloudFront
│ ├─ Global users, TCP/UDP (gaming, IoT) → Global Accelerator
│ └─ Specific geo + legal needs → CloudFront + Geo Restriction
│
├─► DATABASE READS
│ ├─ Same queries repeated → ElastiCache (Redis/Memcached)
│ ├─ Read-heavy RDS → Read Replicas (up to 15 for Aurora)
│ ├─ DynamoDB reads → DAX (microsecond cache)
│ └─ Global reads → DynamoDB Global Tables / Aurora Global DB
│
├─► API RESPONSES
│ ├─ API Gateway → enable caching
│ ├─ Lambda cold starts → Provisioned Concurrency
│ └─ Lambda + RDS → RDS Proxy (connection pooling)
│
├─► EC2 LAUNCH / BOOT TIME
│ ├─ Static components → Golden AMI (pre-baked)
│ ├─ Dynamic config → User Data scripts
│ ├─ Both → Hybrid (Golden AMI + User Data)
│ └─ EBS volumes → enable EBS Fast Snapshot Restore
│
└─► NETWORK / DATA TRANSFER
├─ On-prem ↔ AWS → Direct Connect (dedicated)
├─ Backup DX path → Site-to-Site VPN
├─ EC2 ↔ EC2 same AZ → Placement Group (cluster)
└─ HPC storage → FSx for LustreWhat needs securing?
│
├─► DATA AT REST
│ ├─ S3 → SSE-S3, SSE-KMS, or SSE-C
│ ├─ EBS → KMS encryption
│ ├─ RDS/Aurora → KMS (enable at creation)
│ ├─ DynamoDB → KMS (AWS owned or customer managed)
│ └─ Secrets → Secrets Manager (rotation) or SSM Parameter Store
│
├─► DATA IN TRANSIT
│ ├─ HTTPS everywhere → ACM certificates
│ ├─ S3 → bucket policy with aws:SecureTransport
│ └─ VPN / DX → encrypted by default
│
├─► NETWORK
│ ├─ Instance level → Security Groups (stateful)
│ ├─ Subnet level → NACLs (stateless)
│ ├─ VPC level → Network Firewall (L3-L7)
│ ├─ Web apps → WAF (L7, CloudFront/ALB/API GW)
│ └─ DDoS → Shield (Standard free, Advanced paid)
│
├─► ACCESS CONTROL
│ ├─ "Who can access AWS resources" → IAM Policies
│ ├─ "Org-wide guardrails" → SCPs
│ ├─ "Cross-account" → Resource Policy or IAM Role
│ ├─ "Temporary credentials" → STS AssumeRole
│ └─ "External identity" → Cognito / SSO (IAM Identity Center)
│
└─► AUDIT / COMPLIANCE
├─ "Who did what" → CloudTrail
├─ "Is it compliant" → Config
├─ "Automated compliance audit" → Audit Manager
└─ "Security findings dashboard" → Security Hub| Scenario Keywords | → Answer | Topic Area |
|---|---|---|
| “Reduce boot time” + “static + dynamic” | Golden AMI + User Data | EC2/Deployment |
| “Search any field” / “partial text” | OpenSearch (not DynamoDB) | Database |
| “Query S3 data with SQL” | Athena | Database/Analytics |
| “React to S3 upload” | S3 Event → Lambda or EventBridge | Serverless |
| “Decouple microservices” | SQS (or SNS for fan-out) | Messaging |
| “Global low-latency DB” | DynamoDB Global Tables | Database |
| “Global low-latency SQL” | Aurora Global Database | Database |
| “Cache DB queries” (relational) | ElastiCache | Database |
| “Cache DB queries” (DynamoDB) | DAX | Database |
| “Cache API responses” | API Gateway Caching | Serverless |
| “Migrate DB, no downtime” | DMS with CDC | DR/Migration |
| “Move servers to AWS” | MGN | DR/Migration |
| “Multi-account security baseline” | Control Tower + SCPs | Security |
| “Central log analysis” | CloudWatch + Subscription Filters | Monitoring |
| “Cost per project/team” | Cost Allocation Tags | Billing |
| “Prevent action org-wide” | SCP (not Config — Config only detects) | Security |
| “Auto-fix non-compliant” | Config + SSM Automation | Monitoring |
| “Encrypt at rest, auto-rotate key” | KMS with automatic rotation | Security |
| “Share resources cross-account” | AWS RAM | IAM/Networking |
| “DNS failover” | Route 53 Failover routing + Health Check | Route 53 |
https://www.w3schools.com/aws/aws_quiz.php
https://pages.awscloud.com/NAMER-partner-GC-Partner-Cert-Readiness-Cloud-Practitioner-2024-conf.html
https://www.udemy.com/course/aws-certified-cloud-practitioner-new/
https://media.datacumulus.com/aws-ccp/AWS%20Certified%20Cloud%20Practitioner%20Slides%20v28.pdf
in progress..