02a-AWS

AWS (Amazon Web Services):

AWS Cloud Computing:

Six advantages of cloud computing:

Problems solved by the cloud:

Well-Architected Framework:
General Guiding Principles:

AWS Cloud Best Practices - Design Principles:

Well-Architected Framework (6 Pillars):

AWS Customer Carbon Footprint Tool: track, measure, review and forecast the Carbon emissions generated from your AWS usage. Helps you meet your own sustainability goals.

AWS Cloud Adoption Framework (AWS CAF) helps you build and then execute a comprehensive plan for your digital transformation through innovating use of AWS.

AWS Cloud Adoption Framework (AWS CAF)
The AWS Cloud Adoption Framework (AWS CAF) leverages AWS experience and best practices to help you digitally transform and accelerate your business outcomes through innovative use of AWS. AWS CAF identifies specific organizational capabilities that underpin successful cloud transformations. These capabilities provide best practice guidance that helps you improve your cloud readiness. Six perspectives:

AWS CAF - Transformation Domains:

AWS Right sizing: is the process of matching instance types and sizes to your workload performance and capacity requirements at lowest possible cost.

AWS Professional Services & Partner Network
:

AWS IQ: quickly find professioal help for your AWS projects. Engage and pay AWS Certified third-party experts for on-demand project work. Video-conferencing, contract management, secure collaboration, integrated billing.

AWS re:Post: AWS-managed Q&A service.

AWS Managed Services (AMS) provides infrastructure and application support on AWS. Offers a team of AWS experts who manage and operate your infrastructure for security, reliability and abailability.

AWS Global Infrastructure:

AWS Regions: is a physical location around the world where we cluster data centers. We call each group of logical data centers an Availability Zone. Each AWS Region consists of a minimum of three, isolated, and physically separate AZs within a geographic area.

How to choose an AWS Region:

AWS Availability Zones (AZ): is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region. AZs give customers the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center.

AWS Local Zone location is an extension of an AWS Region where you can run your latency sensitive applications using AWS services such as Amazon Elastic Compute Cloud, Amazon Virtual Private Cloud, Amazon Elastic Block Store, Amazon File Storage, and Amazon Elastic Load Balancing in geographic proximity to end-users.

AWS Edge Locations (Point of Presence): is a site that Amazon CloudFront uses to store cached copies of your content closer to your customers for faster delivery.

AWS WaveLenght are infrastructure deplaoyments, embedded within the telecommunications providers’ datacenters at the edge of the 5G networks.

AWS Outposts are “server racks” that offers the same AWS infrastructure, services, APIs & tools to build your own applications on-premises just as in the cloud. AWS will setup and manage “Outpost racks” within your on-premises infrastructure. Customer is responsible of the Outposts Rack physical security.

Service: AWS offers a broad set of global cloud-based products including compute, storage, database, analytics, networking, machine learning and AI, mobile, developer tools, IoT, security, enterprise applications, and much more.

Tools to access AWS Services:

AWS Shared Responsibility Model:

AWS Identity and Access Management:

IAM (Identity and Access Management) enables you to securely control access to Amazon Web Services services and resources for your users.
Users are people within your organization, and can be grouped. Users don’t have to belong to a group, and user can belong to multiple groups. Groups only contain users, not other groups. Root privileges has complete access to all AWS services and resources. Root account created by default.

Actions that can be performed only by the root user:

Policies define the permissions of the users and groups described in JSON documents.

IAM Roles for services set of permission attached to some AWS services to perform actions on your behalf (EC2, Lambda, CloudFormation).

IAM Policy is a JSON document that defines permissions. ┌─────────────────────────────────────────────────────────────┐ │ IAM POLICY │ ├─────────────────────────────────────────────────────────────┤ │ Version (Required) - Policy language version │ │ Id (Optional) - Policy identifier │ │ Statement (Required) - Array of permission blocks │ │ ├── Sid (Optional) - Statement ID │ │ ├── Effect (Required) - “Allow” or “Deny” │ │ ├── Principal (Required*) - Who the policy applies to │ │ ├── Action (Required) - What actions are permitted │ │ ├── Resource (Required) - Which resources are affected │ │ └── Condition (Optional) - When the policy applies │ └─────────────────────────────────────────────────────────────┘

Policy Types:

Best IAM practices:

IAM Credentials Report (account-level) a report that lists all your account’s users and the status of their various; IAM Access Advisor (user-level) - access advisor shows the service permissions granted to a user and when those services were last accessed. Can be used to revise policies.

AWS Resource Access Manager (AWS RAM) helps you securely share your resources across AWS accounts, within your organization or organizational units (OUs) and with IAM roles and users for supported resource types (Aurora, VPC Subnets, Transit Gateway, Route53, EC2 Dedicated Hosts, License Manager Configurations, etc). Avoid resource duplication.

AWS Service Catalog self-portal to launch a set of authorized products pre-defined by admins.

AWS STS (Security Token Service): enables you to create temporary, limited-privileges credentials to access your AWS resources.

STS APIUse Case
AssumeRoleCross-account access, or same-account role assumption
AssumeRoleWithSAMLUsers logged in with SAML (corporate IdP)
AssumeRoleWithWebIdentityUsers logged in with IdP (Facebook, Google, OIDC) — prefer Cognito instead
GetSessionTokenMFA for root or IAM user
GetFederationTokenTemporary credentials for federated user

Session Policies: Optional policy passed when calling AssumeRole — further restricts the role’s permissions for that session only.

Amazon Cognito:

ComponentPurpose
User PoolsUser directory for sign-up/sign-in, returns JWT tokens
Identity PoolsExchange tokens for temporary AWS credentials (access AWS services)
User → Cognito User Pool → JWT Token → Cognito Identity Pool → AWS Credentials → AWS Services

⚠️ Exam trap: User Pools = authentication (who are you?), Identity Pools = authorization (AWS access)

IAM Access Analyzer:

AWS Directory Services:

ServiceUsers StoredOn-Prem ConnectionUse Case
AWS Managed Microsoft ADIn AWSTwo-way trustFull AD features, MFA, trust with on-prem
AD ConnectorOn-prem onlyProxy (no trust)Keep users on-prem, redirect auth
Simple ADIn AWS❌ CannotBasic AD, standalone, no on-prem
┌─────────────────────────────────────────────────────────────────────────────┐
│                       AWS Directory Services                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. AWS Managed Microsoft AD (two-way trust)                                │
│                                                                             │
│       ┌──────────┐      trust       ┌──────────────────┐                    │
│  auth │          │◄────────────────►│  AWS Managed AD  │ auth               │
│  ◄────┤ On-Prem  │                  │       [MS]       ├────►               │
│       │    AD    │                  └──────────────────┘                    │
│       └──────────┘                                                          │
│                                                                             │
│  2. AD Connector (proxy only - NO users stored in AWS)                      │
│                                                                             │
│       ┌──────────┐      proxy       ┌──────────────────┐                    │
│       │          │◄────────────────►│   AD Connector   │ auth               │
│       │ On-Prem  │                  │       [⚡]        ├────►               │
│       │    AD    │                  └──────────────────┘                    │
│       └──────────┘                                                          │
│                                                                             │
│  3. Simple AD (standalone - NO on-prem connection)                          │
│                                                                             │
│                                     ┌──────────────────┐                    │
│                          ❌         │    Simple AD     │ auth               │
│              (no on-prem)           │       [DB]       ├────►               │
│                                     └──────────────────┘                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

⚠️ Exam trap: AD Connector is just a proxy — it does NOT store users, only redirects authentication to on-prem AD.


AWS Organizations (Global service):

┌─────────────────────────────────────────────────────────────────────┐
│                    Root Organizational Unit (OU)                    │
│  ┌────────────────┐                                                 │
│  │  Management    │  ← Full admin power, SCPs do NOT apply here     │
│  │  Account       │                                                 │
│  └────────────────┘                                                 │
│                                                                     │
│  ┌──────────────────────┐      ┌──────────────────────────────────┐ │
│  │     OU (Dev)         │      │          OU (Prod)               │ │
│  │  ┌────┐  ┌────┐      │      │  ┌────┐  ┌────┐                  │ │
│  │  │Acct│  │Acct│      │      │  │Acct│  │Acct│                  │ │
│  │  └────┘  └────┘      │      │  └────┘  └────┘                  │ │
│  │   Member Accounts    │      │  ┌────────────┐ ┌──────────────┐ │ │
│  └──────────────────────┘      │  │  OU (HR)   │ │ OU (Finance) │ │ │
│                                │  │ ┌──┐ ┌──┐  │ │  ┌──┐ ┌──┐   │ │ │
│                                │  │ │  │ │  │  │ │  │  │ │  │   │ │ │
│                                │  │ └──┘ └──┘  │ │  └──┘ └──┘   │ │ │
│                                │  └────────────┘ └──────────────┘ │ │
│                                └──────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘

Consolidated Billing Benefits:

Multi-Account Strategies:

Service Control Policies (SCP):

⚠️ Exam trap: SCPs don’t affect Management Account — if question asks “restrict ALL accounts”, Management Account is still unrestricted!

⚠️ Exam trap: Service-linked roles are NOT affected by SCPs — they always work!

What SCPs CANNOT do:


AWS Organizations – Tag Policies:


IAM Conditions - restrict API calls based on:

Condition KeyPurposeExample
aws:SourceIpRestrict by client IPOnly allow from corporate IP range
aws:RequestedRegionRestrict by regionOnly allow eu-west-1 API calls
ec2:ResourceTagRestrict based on tagsOnly manage EC2 with tag “Env=Dev”
aws:MultiFactorAuthPresentForce MFARequire MFA for sensitive actions

⚠️ Exam trap: Fake condition keys! Only these are real:


S3 Bucket Policies vs IAM Policies:

AspectIAM PolicyS3 Bucket Policy
Attached toUser/Group/RoleS3 Bucket
Cross-accountRequires role assumptionDirect access via Principal
Use caseUser-centric permissionsResource-centric, public access, cross-account

S3 Access Decision Logic:

IAM Policy ALLOWS  +  S3 Bucket Policy ALLOWS  →  ACCESS ✅
IAM Policy ALLOWS  +  S3 Bucket Policy (silent) →  ACCESS ✅
IAM Policy (silent) +  S3 Bucket Policy ALLOWS  →  ACCESS ✅  (if same account)
IAM Policy DENIES   OR  S3 Bucket Policy DENIES →  DENIED ❌

Cross-account: BOTH must explicitly Allow

Common S3 Policy Conditions:

ConditionPurpose
aws:SourceIpRestrict by IP range
aws:SourceVpceRestrict to specific VPC endpoint
aws:SourceVpcRestrict to specific VPC
s3:x-amz-aclControl ACL settings
s3:x-amz-server-side-encryptionRequire encryption
aws:SecureTransportRequire HTTPS (deny HTTP)

Example - Require HTTPS:

{
  "Effect": "Deny",
  "Principal": "*",
  "Action": "s3:*",
  "Resource": "arn:aws:s3:::bucket/*",
  "Condition": {
    "Bool": { "aws:SecureTransport": "false" }
  }
}

⚠️ Exam trap: "Principal": "*" = anonymous access. "Principal": {"AWS": "*"} = any authenticated AWS user.

⚠️ Exam trap: Cross-account S3 access — bucket policy must explicitly allow the external principal AND the external account needs IAM permissions.

⚠️ Exam trap: S3 ARN patterns matter!


IAM Roles vs Resource-Based Policies (Cross-Account Access):

Two ways to access S3 in another account:

Option 1: Role as Proxy (AssumeRole)
┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│    User      │─────►│    Role      │─────►│   Amazon S3  │
│  Account A   │      │  Account B   │      │  Account B   │
└──────────────┘      └──────────────┘      └──────────────┘
                      (become this role,
                       lose Account A perms)

Option 2: Resource-Based Policy (S3 Bucket Policy)
┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│    User      │─────►│  S3 Bucket   │─────►│   Amazon S3  │
│  Account A   │      │   Policy     │      │  Account B   │
└──────────────┘      └──────────────┘      └──────────────┘
                      (grants access to
                       Account A user directly,
                       keeps Account A perms)
AspectAssume RoleResource-Based Policy
PermissionsGive up original, take role’sKeep original + gain resource access
Use caseNeed full different identityNeed BOTH source and target access

Example: User in Account A needs to scan DynamoDB in Account A AND dump to S3 in Account B → Use resource-based policy on S3 (keeps DynamoDB permissions)

EventBridge Target Permissions:

TargetPolicy TypeWhy
LambdaResource-basedLambda can define “who invokes me”
SNSResource-basedSNS can define “who publishes to me”
SQSResource-basedSQS can define “who sends to me”
S3Resource-basedS3 can define “who writes to me”
API GatewayResource-basedAPI GW can define “who calls me”
KinesisIAM RoleNo invoke policy — need role
EC2 Auto ScalingIAM RoleNo invoke policy — need role
ECS TaskIAM RoleNo invoke policy — need role
SSM Run CommandIAM RoleNo invoke policy — need role

Memory trick: “Can the target say WHO is allowed to invoke it?”

Memory hook: “SLSS + API GW” = Resource-based (SNS, Lambda, SQS, S3, API Gateway) Memory hook: “KEES” = IAM Role needed (Kinesis, EC2 Auto Scaling, ECS, SSM)

⚠️ Exam trap: Lambda = resource-based, Kinesis = IAM role. Don’t mix them up!


IAM Permission Boundaries:

┌───────────────────────────────────────────────────────────┐
│                                                           │
│      ┌─────────────┐         ┌─────────────────┐          │
│      │Organizations│         │   Permissions   │          │
│      │    SCP      │         │    Boundary     │          │
│      │      ✓      │    ✓    │       ✓         │          │
│      │         ┌───┴─────────┴───┐             │          │
│      │         │                 │             │          │
│      └─────────┤   Effective     ├─────────────┘          │
│                │   Permissions   │                        │
│      ┌─────────┤       ✓         ├─────────────┐          │
│      │         │                 │             │          │
│      │         └───┬─────────────┘             │          │
│      │  Identity   │         ✓                 │          │
│      │   Policy    │                           │          │
│      │      ✓      │                           │          │
│      └─────────────┘                           │          │
│                                                           │
└───────────────────────────────────────────────────────────┘

Use Cases:

⚠️ Exam trap: Permission Boundaries do NOT apply to groups! Only users and roles.


IAM Policy Evaluation Logic (order matters):

                         ┌─────────────────┐
                         │  Start: DENY    │
                         └────────┬────────┘
                                  ▼
                    ┌─────────────────────────┐
                    │   Explicit Deny?        │──── YES ────► DENY ❌
                    └─────────────┬───────────┘
                                  │ NO
                                  ▼
                    ┌─────────────────────────┐
                    │   In Org with SCP?      │──── NO ─────► Skip to Resource-Based
                    └─────────────┬───────────┘
                                  │ YES
                                  ▼
                    ┌─────────────────────────┐
                    │   SCP Allows?           │──── NO ─────► DENY ❌ (implicit)
                    └─────────────┬───────────┘
                                  │ YES
                                  ▼
                    ┌─────────────────────────┐
                    │   Resource-Based        │──── ALLOW + same account ──► ALLOW ✅
                    │   Policy Allows?        │
                    └─────────────┬───────────┘
                                  │ (continue if cross-account or no resource policy)
                                  ▼
                    ┌─────────────────────────┐
                    │   Identity Policy       │──── NO ─────► DENY ❌ (implicit)
                    │   Allows?               │
                    └─────────────┬───────────┘
                                  │ YES
                                  ▼
                    ┌─────────────────────────┐
                    │   Permission Boundary   │──── NO ─────► DENY ❌ (implicit)
                    │   Allows? (if exists)   │
                    └─────────────┬───────────┘
                                  │ YES
                                  ▼
                    ┌─────────────────────────┐
                    │   Session Policy        │──── NO ─────► DENY ❌ (implicit)
                    │   Allows? (if exists)   │
                    └─────────────┬───────────┘
                                  │ YES
                                  ▼
                         ┌─────────────────┐
                         │   ALLOW ✅      │
                         └─────────────────┘

Key Rules:


AWS IAM Identity Center (successor to AWS SSO):

Active Directory Integration:

Option 1: AWS Managed Microsoft AD (out-of-box integration)
┌──────────────────┐              ┌─────────────────────┐
│  IAM Identity    │───connect───►│ AWS Managed         │
│  Center          │              │ Microsoft AD        │
└──────────────────┘              └─────────────────────┘

Option 2: Self-Managed AD (two approaches)
┌──────────────────┐     ┌─────────────────┐    two-way trust    ┌────────────┐
│  IAM Identity    │─────│ AWS Managed     │◄──────────────────►│ On-Prem AD │
│  Center          │     │ Microsoft AD    │                     └────────────┘
└────────┬─────────┘     └─────────────────┘
         │
         │               ┌─────────────────┐       proxy          ┌────────────┐
         └───────────────│  AD Connector   │◄────────────────────►│ On-Prem AD │
                         └─────────────────┘                      └────────────┘

Permission Sets: Collection of IAM Policies assigned to users/groups for AWS access ABAC: Fine-grained permissions based on user attributes (cost center, title, locale)


AWS Control Tower - multi-account governance:

Guardrails (ongoing governance):

TypeImplementationExample
PreventiveSCPsRestrict regions across all accounts
DetectiveAWS ConfigIdentify untagged resources

Detective Guardrail Flow:

┌─────────────────────────────────────────────────────────────────────┐
│  AWS Control Tower                                                  │
│  ┌─────────────┐                                                    │
│  │ Guardrail   │ trigger                                            │
│  │ (Detective) ├───────────►┌─────┐ notify  ┌───────┐               │
│  │             │(NON_COMPLIANT)│ SNS │────────►│ Admin │             │
│  │ AWS Config  │             └──┬──┘         └───────┘              │
│  └──────┬──────┘                │                                   │
│         │ monitor               │ invoke                            │
│         ▼                       ▼                                   │
│  ┌────────────────┐       ┌──────────┐                              │
│  │ Member Accounts│◄──────│  Lambda  │ remediate (add tags)         │
│  └────────────────┘       └──────────┘                              │
└─────────────────────────────────────────────────────────────────────┘

⚠️ IAM Exam Traps Summary

TrapReality
“IAM is regional”❌ IAM is GLOBAL — no region selection
“SCPs restrict Management Account”❌ Management Account has full power always
“SCPs affect service-linked roles”❌ Service-linked roles are NOT affected
“Permission Boundaries work on groups”❌ Only users and roles
“AssumeRole keeps original permissions”❌ You give up original, take role’s
“Resource-based policy requires role assumption”No role needed — keep original permissions
“Cognito User Pools give AWS access”❌ User Pools = auth only; Identity Pools = AWS credentials
“S3 GetObject on bucket ARN”❌ Need bucket/* for object actions, bucket alone = Access Denied
aws:SourceRegion condition key”Doesn’t exist — use aws:RequestedRegion
“EventBridge → Kinesis uses resource policy”❌ Kinesis needs IAM Role (no resource-based policy)

🎯 IAM Quick Decision Table

Scenario→ Solution
Restrict entire AWS accountSCP (at OU or Account level)
Restrict specific user/role (not whole account)Permission Boundary
Cross-account, keep original permissionsResource-based policy
Cross-account, need different identityAssume Role
Centralized login for multiple AWS accountsIAM Identity Center
External users (millions, mobile/web)Cognito
Temporary AWS credentialsSTS
Corporate IdP integration (SAML)IAM Identity Center or AssumeRoleWithSAML
Social login (Google, Facebook)Cognito Identity Pools
Share resources across accountsAWS RAM
Find externally shared resourcesIAM Access Analyzer
Multi-account governance with guardrailsAWS Control Tower
Standardize tags across OrganizationTag Policies


🎯 MASTER SUMMARY: IAM & Organizations Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: Implicit Deny by Default

Everything starts DENIED. You must explicitly Allow. This applies to:

Why? Security principle — if you forget something, it’s denied (safe default).

Derive: If SCP only allows EC2 → everything else is denied. No need to memorize “deny lists.”


Principle 2: Explicit Deny ALWAYS Wins

No matter how many Allows exist, one Deny = blocked.

Why? Prevents privilege escalation — you can always restrict, never override a restriction.

Derive: To block an action, just add Deny anywhere in the chain. Order doesn’t matter for Deny.


Principle 3: Permissions = Intersection, Not Union

Effective permissions = what ALL layers allow together.

SCP ∩ Permission Boundary ∩ Identity Policy = Effective Permissions

Why? Each layer is a guardrail — you can only narrow, never expand beyond any layer.

Derive: If SCP allows S3+EC2, but Identity Policy only allows S3 → only S3 works.


Principle 4: Management Account is Untouchable

SCPs never apply to Management Account. It always has full power.

Why? Someone must be able to fix things if SCPs lock everyone out.

Derive: “Restrict ALL accounts” questions — Management Account is the exception.


Principle 5: Scope Determines Tool

Why? Different granularity needs different tools.

Derive: “Prevent developers from escalating privileges” → Permission Boundary (user-level, not account-level).


Principle 6: Cross-Account = Two Choices

  1. Assume Role → Give up original identity, become the role
  2. Resource-based Policy → Keep original identity + access target

Why? Sometimes you need both source AND target access (e.g., DynamoDB scan → S3 dump).

Derive: “Access DynamoDB in Account A AND S3 in Account B” → Resource-based policy on S3.


Principle 7: Temporary Credentials > Long-term

STS provides temporary, auto-expiring credentials.

Why? Reduces blast radius of compromise — credentials expire.

Derive: Cross-account access, federation, MFA → all use STS behind the scenes.


Principle 8: Authentication ≠ Authorization

Why? Separation of concerns — different systems handle different problems.

Derive: “Millions of mobile users need S3 access” → User Pool (auth) + Identity Pool (AWS creds).


Principle 9: Service-Linked Roles are Special

SCPs don’t affect them. They’re created BY AWS FOR AWS services.

Why? AWS services need guaranteed permissions to function.

Derive: “SCP blocks everything but service still works” → It’s using a service-linked role.


Principle 10: IAM is Global

No region selection. Users, roles, policies exist everywhere.

Why? Identity should be consistent — you’re YOU regardless of region.

Derive: “Create IAM user in us-east-1” → Trick question, IAM has no region.


Part 2: Decision Trees

Cross-Account Access Decision

Need cross-account access?
    │
    ├─► Need BOTH source + target permissions?
    │       │
    │       └─► YES → Resource-based policy
    │
    └─► Need different identity/permissions?
            │
            └─► YES → Assume Role

Restriction Scope Decision

What to restrict?
    │
    ├─► Entire account(s)?
    │       │
    │       └─► SCP (attach to OU or Account)
    │
    ├─► Specific user/role?
    │       │
    │       └─► Permission Boundary
    │
    └─► Specific actions with conditions?
            │
            └─► IAM Policy with Conditions

Identity Provider Decision

Who needs access?
    │
    ├─► Internal employees to AWS accounts?
    │       │
    │       └─► IAM Identity Center
    │
    ├─► External users (millions, mobile/web)?
    │       │
    │       └─► Cognito
    │
    └─► Corporate IdP (SAML)?
            │
            ├─► To AWS Console → IAM Identity Center
            └─► Programmatic → AssumeRoleWithSAML

The “CANNOT” List

WhatCannot
SCPsAffect Management Account
SCPsAffect service-linked roles
Permission BoundariesApply to groups
Cognito User PoolsGive AWS credentials directly
Simple ADJoin with on-premises AD
AD ConnectorStore users (it’s just a proxy)

Part 3: Scenario Pattern Recognition

Pattern: “Restrict ALL member accounts from using a service”

Keywords: all accounts, prevent, organization-wide, block service Answer: SCP at Root OU level Why: SCPs cascade down OUs. Root OU = all member accounts (not Management).


Pattern: “Allow developers to create IAM users but prevent privilege escalation”

Keywords: delegate, self-service, prevent escalation, limit what they can create Answer: Permission Boundary Why: Boundary limits max permissions of created entities.


Pattern: “User needs to access resources in two accounts simultaneously”

Keywords: scan + dump, read from A write to B, both accounts Answer: Resource-based policy (on target resource) Why: AssumeRole would lose access to source account.


Pattern: “Millions of mobile app users need S3 access”

Keywords: mobile, web app, millions, external users, S3/DynamoDB access Answer: Cognito User Pools + Identity Pools Why: User Pools authenticate, Identity Pools give temporary AWS credentials.


Pattern: “Corporate employees need SSO to multiple AWS accounts”

Keywords: SSO, single sign-on, multiple accounts, employees, SAML, Active Directory Answer: IAM Identity Center Why: Built for this — integrates with AD, manages permission sets across accounts.


Pattern: “Detect untagged resources across organization”

Keywords: detect, compliance, untagged, non-compliant, monitor Answer: Control Tower Detective Guardrail (uses AWS Config) Why: Detective = monitoring (not blocking). Uses Config rules.


Pattern: “Prevent creating resources in unapproved regions”

Keywords: prevent, block, region restriction, all accounts Answer: SCP with aws:RequestedRegion condition (or Control Tower Preventive Guardrail) Why: Preventive = blocking. SCPs stop the action.


Pattern: “Share VPC subnets across accounts”

Keywords: share, subnets, cross-account, Transit Gateway, avoid duplication Answer: AWS RAM (Resource Access Manager) Why: RAM shares resources without duplication.


Pattern: “Find resources shared with external accounts”

Keywords: identify, find, external access, shared externally, audit Answer: IAM Access Analyzer Why: Analyzes policies to find external principal access.


Pattern: “Temporary credentials for cross-account access”

Keywords: temporary, cross-account, assume, programmatic Answer: STS AssumeRole Why: Returns temporary credentials for the target role.


Pattern: “Require MFA for sensitive operations”

Keywords: MFA, multi-factor, sensitive, delete, critical Answer: IAM Policy Condition: aws:MultiFactorAuthPresent Why: Condition key checks MFA status.


Pattern: “Standardize tag format across organization”

Keywords: standardize, tags, enforce, format, organization-wide Answer: Tag Policies Why: Define allowed tag keys/values, prevent non-compliant tags.


Pattern: “Connect IAM Identity Center to on-premises AD”

Keywords: on-premises, Active Directory, Identity Center, trust Answer: Two-way trust with AWS Managed Microsoft AD, OR AD Connector (proxy) Why: Can’t connect directly — need AWS AD service in between.


Part 4: Quick Reference Tables

SCP vs Permission Boundary vs IAM Policy

AspectSCPPermission BoundaryIAM Policy
ScopeAccount/OUUser/RoleUser/Group/Role
Applies to groups?N/A (account level)❌ NO✅ YES
Affects root user?✅ YES❌ NO (root has no boundary)❌ NO
Affects Management Account?❌ NO✅ YES✅ YES
DefaultImplicit DenyImplicit DenyImplicit Deny

Directory Services Comparison

ServiceUsers StoredOn-Prem ConnectionUse Case
AWS Managed Microsoft ADIn AWSTwo-way trustNeed AD features in AWS
AD ConnectorOn-prem onlyProxyKeep users on-prem
Simple ADIn AWS❌ CannotBasic AD, no on-prem

STS API Quick Reference

APIWhen to Use
AssumeRoleCross-account, same-account role switch
AssumeRoleWithSAMLCorporate IdP (SAML) login
AssumeRoleWithWebIdentitySocial login (prefer Cognito)
GetSessionTokenMFA for IAM user
GetFederationTokenCustom federation

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“restrict ALL member accounts”SCP at Root OU
“Management Account” + “restrict”❌ Can’t — SCPs don’t apply
“service-linked role” + “blocked”❌ Can’t — SCPs don’t affect
“prevent privilege escalation”Permission Boundary
“groups” + “permission boundary”❌ Not supported
“both accounts” / “source and target”Resource-based policy
“give up permissions”Assume Role
“millions of users” / “mobile app”Cognito
“User Pools” + “AWS access”❌ Need Identity Pools too
“SSO multiple accounts”IAM Identity Center
“on-premises AD” + “Identity Center”AWS Managed AD + trust, or AD Connector
“Simple AD” + “on-prem”❌ Cannot connect
“share resources” / “avoid duplication”AWS RAM
“find external access”IAM Access Analyzer
“temporary credentials”STS
“MFA required”aws:MultiFactorAuthPresent condition
“region restriction”aws:RequestedRegion condition or SCP
“tag standardization”Tag Policies
“detect” + “compliance”Detective Guardrail (AWS Config)
“prevent” + “organization-wide”Preventive Guardrail (SCP)
“Control Tower” + “remediate”Lambda (triggered by SNS from Config)
“landing zone”Control Tower
“SAML” + “programmatic”AssumeRoleWithSAML
“social login”Cognito Identity Pools
“IAM” + “regional”❌ Trick — IAM is GLOBAL

Part 6: Elimination Checklist

□ Is it about restricting accounts?
  → Yes = Think SCP
  → But Management Account? = SCP won't work

□ Is it about restricting a specific user/role?
  → Yes = Think Permission Boundary
  → Is it a group? = Permission Boundary won't work

□ Is it cross-account access?
  → Need both source + target access? = Resource-based policy
  → Need different identity? = Assume Role

□ Is it millions of external users?
  → Yes = Cognito (not IAM users)
  → Need AWS credentials? = Identity Pools required

□ Is it corporate employees to AWS?
  → Yes = IAM Identity Center

□ Is it about compliance/detection?
  → Yes = Detective Guardrail / AWS Config

□ Is it about blocking/prevention?
  → Yes = Preventive Guardrail / SCP

🏆 The Golden Rules

  1. Implicit Deny — Everything denied until explicitly allowed
  2. Explicit Deny Wins — One Deny overrides all Allows
  3. Intersection Rule — Effective = SCP ∩ Boundary ∩ Policy
  4. Management Account Exception — SCPs don’t touch it
  5. Service-Linked Roles Exception — SCPs don’t affect them
  6. Permission Boundaries ≠ Groups — Only users and roles
  7. AssumeRole = Identity Switch — You become the role, lose original
  8. Resource-Based = Keep Both — Keep original + access target
  9. User Pools ≠ AWS Access — Need Identity Pools for credentials
  10. IAM = Global — No region, ever
  11. SCP Allow = Everything Else Denied — Like IAM, implicit deny default
  12. FullAWSAccess SCP — Default SCP that allows everything (remove carefully!)

Amazon VPC:

VPC (Virtual Private Cloud) is a service that lets you launch AWS resources in a logically isolated virtual network that you define.


CIDR – IPv4:

CIDR (Classless Inter-Domain Routing) — method for allocating IP addresses. Used in Security Groups, NACLs, and all AWS networking.

CIDRIPsUse Case
/321Single IP (e.g., SSH from your IP)
/2816Smallest VPC/subnet
/2732Small subnet
/2664Medium subnet
/24256Common subnet size
/1665,536Largest VPC
/0All0.0.0.0/0 = open to internet

⚠️ Exam trap: “Need 29 IPs for EC2” → /27 (32 IPs) is NOT enough! AWS reserves 5 IPs per subnet (first 4 + last 1) → 32 - 5 = 27 < 29. Use /26 (64 - 5 = 59 ✓)

AWS Reserved IPs per subnet (5):


IP Addresses:

IPv6 in VPC:

⚠️ Exam trap: “Can’t launch EC2 in subnet” → NOT because of IPv6 (space is huge). It’s because no available IPv4 in the subnet → solution: create a new IPv4 CIDR in your subnet.


Subnets:

Subnets partition your network inside VPC.


Internet Gateway (IGW):

Internet gateway helps VPC instances connect with the internet (public subnets have a route to IGW).


NAT (Network Address Translation):

NAT allows instances in private subnets to access the internet while remaining private.

Private Subnet EC2 ──► NAT (Public Subnet) ──► IGW ──► Internet
       10.0.0.20          EIP: 12.34.56.78         ▲
                           (translates src IP)      │
                                                    │
                    Response comes back to NAT ◄────┘
                    NAT forwards to 10.0.0.20

NAT Instance (self-managed, outdated but still on exam):

NAT Gateway (AWS-managed):

NAT Gateway HA: Resilient within single AZ only → create one NATGW per AZ for fault tolerance. No cross-AZ failover needed.

NAT Gateway vs NAT Instance:

FeatureNAT GatewayNAT Instance
AvailabilityHA within AZ (create in each AZ)Manual failover (ASG + script)
BandwidthUp to 100 GbpsDepends on instance type
MaintenanceAWS-managedYou manage (patching, OS)
CostPer hour + data transferredPer hour + instance type + network
Public IPv4
Private IPv4
Security Groups❌ No✅ Yes
Bastion Host❌ No✅ Yes
Port Forwarding❌ No✅ Yes (iptables)

⚠️ Exam trap: “NAT + Security Groups” → NAT Instance (NAT Gateway has NO SGs). “NAT + Bastion Host” → also NAT Instance.

⚠️ Exam trap: “Private instances need internet, managed, HA” → NAT Gateway. NAT Instance is legacy — only pick it if question says “existing NAT Instance” or needs SG/Bastion.


Bastion Host:

Bastion Host = EC2 instance in public subnet used to SSH into private instances.

Users ──SSH (port 22)──► Bastion Host (Public Subnet) ──SSH──► EC2 (Private Subnet)
                         BastionHost-SG                        LinuxInstance-SG
                         Inbound: port 22                      Inbound: port 22
                         from corp CIDR                        from BastionHost-SG

⚠️ Exam trap: “SSH into private EC2” → Bastion Host (or SSM Session Manager for no-SSH approach). NOT NAT Gateway — NAT is for outbound internet only.


Security Groups vs NACLs:

Network ACL (NACL) — firewall at subnet level. Can have ALLOW and DENY rules. Rules only include IP addresses. Automatically applies to all instances in the subnet. STATELESS: Return traffic must be explicitly allowed. Checks packets both ways.

Security Groups — firewall at instance (ENI) level. Can have only ALLOW rules. Rules include IP addresses and other security groups. STATEFUL: Return traffic is automatically allowed.

Inbound port 22 allowed:   Outside ──SSH──► Your EC2  ✅ (response auto-allowed out)
Outbound port 22 allowed:  Your EC2 ──SSH──► Outside  ✅ (response auto-allowed in)

⚠️ Exam trap: “Default NACL” → allows all traffic. “Custom NACL” → denies all by default. Don’t confuse them!

FeatureSecurity GroupNACL
LevelInstance (ENI)Subnet
RulesAllow onlyAllow AND Deny
StateStateful (return auto-allowed)Stateless (must allow both directions)
Rule EvaluationAll rules evaluated togetherRules processed in order (lowest # first, first match wins)
AssociationManually assigned to instanceAutomatically applies to all instances in subnet
DefaultDeny all inbound, allow all outboundAllow ALL traffic

Ephemeral Ports:

Client (11.22.33.44)                          Web Server (55.66.77.88)
    ──► Src Port: 50105, Dest Port: 443 ──►      (fixed port 443)
    ◄── Dest Port: 50105, Src Port: 443 ◄──      (response to ephemeral port)

NACL with Ephemeral Ports — Example (Web → DB):

  Web Subnet (Public)                              DB Subnet (Private)
  ┌──────────────────┐                             ┌─────────────────┐
  │    EC2 (Web)     │                             │ RDS (port 3306) │
  │                  │                             │                 │
  └────────┬─────────┘                             └────────┬────────┘
           │                                                │
        Web-NACL                                         DB-NACL
     ┌───────────────┐                           ┌───────────────┐
     │ OUTBOUND:     │    ── request ──►         │ INBOUND:      │
     │  port 3306    │                           │  port 3306    │
     │  to DB CIDR   │                           │  from Web CIDR│
     │               │                           │               │
     │ INBOUND:      │    ◄── response ──        │ OUTBOUND:     │
     │  port 1024-   │                           │  port 1024-   │
     │  65535         │                           │  65535        │
     │  from DB CIDR │                           │  to Web CIDR  │
     └───────────────┘                           └───────────────┘

4 NACL rules needed (because stateless = each direction, each NACL):

NACLDirectionPortCIDRWhy
Web-NACLOutbound3306DB Subnet CIDRWeb initiates DB connection
Web-NACLInbound1024-65535DB Subnet CIDRDB response on ephemeral port
DB-NACLInbound3306Web Subnet CIDRAccept DB connection from Web
DB-NACLOutbound1024-65535Web Subnet CIDRSend response on ephemeral port

⚠️ Exam trap: With SGs you’d only need 2 rules (allow 3306 each side) — stateful handles the rest. With NACLs you need 4 rules — don’t forget the ephemeral port rules for return traffic!

⚠️ Exam trap: “NACL blocking return traffic” → you forgot to allow ephemeral ports outbound (server side) or inbound (client side). SGs don’t have this problem (stateful).


VPC Flow Logs:

VPC Flow Logs capture information about IP traffic going into your interfaces:

Flow Log Syntax:

version account-id interface-id srcaddr dstaddr srcport dstport packets bytes start end protocol action log-status
2 123456789010 eni-1235b8ca srcIP dstIP 20641 22 6 20 4249 ... ACCEPT OK
2 123456789010 eni-1235b8ca srcIP dstIP 49761 3389 6 20 4249 ... REJECT OK

Key fields:

Troubleshoot SG vs NACL using Flow Logs (ACTION field):

ScenarioInboundOutboundBlocked by
Incoming blockedREJECTNACL or SG
Incoming allowed, response blockedACCEPTREJECTNACL (SG is stateful → would auto-allow)
Outgoing blockedREJECTNACL or SG
Outgoing allowed, response blockedACCEPTREJECTNACL

Memory trick: “ACCEPT then REJECT” = always NACL (stateless blocks return traffic). SG would never block return traffic (stateful).

Flow Logs Architectures:


VPC Peering:

VPC Peering — privately connect two VPCs using AWS’ network, behave as if same network.

VPC-A ◄──Peering──► VPC-B ◄──Peering──► VPC-C
  │                                        │
  └──────────Peering (A↔C needed!)─────────┘
         (B does NOT relay traffic)

⚠️ Exam trap: “VPC A peers with B, B peers with C, can A talk to C?” → NO! Not transitive. Need separate A↔C peering. If you need many VPCs connected → use Transit Gateway instead.


VPC Endpoints:

VPC Endpoints connect to AWS services using private network instead of public internet.

Two types:

FeatureInterface EndpointGateway Endpoint
HowProvisions an ENI (private IP)Target in Route Table
Security Group✅ Must attach❌ No
ServicesMost AWS servicesS3 and DynamoDB only
Cost$ per hour + $ per GBFree
Access from on-prem✅ (via VPN/DX)❌ No
Powered byAWS PrivateLinkRoute Table entry
Option 1 (costly):     Lambda (VPC) ──► NAT GW ──► IGW ──► DynamoDB (public)
Option 2 (free/better): Lambda (VPC) ──► Gateway Endpoint ──► DynamoDB (private)

⚠️ Exam trap: “S3 or DynamoDB access from VPC” → Gateway Endpoint (free, preferred on exam). Interface Endpoint only when access needed from on-premises (VPN/Direct Connect), different VPC, or different region.

⚠️ Exam trap: “Lambda in VPC can’t reach DynamoDB” → either add NAT GW + IGW, or (better) use VPC Gateway Endpoint for DynamoDB.

⚠️ Exam trap - “VPC resources access SQS/SNS/KMS privately (no internet)”:


AWS PrivateLink — expose a service in your VPC to other VPCs privately.


Site-to-Site VPN:

Site-to-Site VPN — encrypted connection between on-premises and AWS over the public internet.

Components:

Setup:

AWS VPN CloudHub:

⚠️ Exam trap: “Ping EC2 from on-premises doesn’t work” → check ICMP allowed in SG inbound + Route Propagation enabled.


Direct Connect (DX):

Direct Connect — dedicated private physical connection from on-premises to AWS.

Virtual Interfaces (VIFs):

Corporate DC ──► Customer Router ──► DX Endpoint ──► VPG ──► VPC
                                     (DX Location)
                                      VLAN 1 (Private VIF) ──► EC2
                                      VLAN 2 (Public VIF)  ──► S3, Glacier

Connection Types:

TypeSpeedDetails
Dedicated1 / 10 / 100 GbpsPhysical port dedicated to you. Request via AWS first
Hosted50 Mbps – 10 GbpsVia AWS Direct Connect Partners. Capacity on demand

Direct Connect Gateway:

Resiliency:

Backup:

⚠️ Exam trap: “Private, dedicated, consistent connection” → Direct Connect. “Encrypted over internet” → Site-to-Site VPN. DX is NOT encrypted by default (add VPN on top for encryption).

⚠️ Exam trap: “Improve connection within days/1 week” → NOT Direct Connect (takes > 1 month). Use Site-to-Site VPN for quick setup. DX is only the answer when time is not a constraint.


AWS Client VPN:

AWS Client VPN — connect end-devices (laptops) to AWS or on-premises via OpenVPN over the internet. Access EC2 using private IP.


Transit Gateway:

Transit Gateway — transitive peering hub for thousands of VPCs and on-premises (hub-and-spoke / star topology).

                    ┌──► VPC-A
                    │
Corporate DC ──► Transit Gateway ──► VPC-B
  (VPN/DX)          │
                    ├──► VPC-C
                    │
                    └──► VPC-D

Without TGW: complex mesh of VPC peering + VPN connections (N² connections) With TGW: single hub, all spokes connect to it (N connections)

ECMP (Equal-Cost Multi-Path Routing):

SetupThroughput
VPN → VGW (1 connection, 2 tunnels)1.25 Gbps
VPN → TGW (1 connection, ECMP)2.5 Gbps (both tunnels used)
2× VPN → TGW (ECMP)5.0 Gbps
3× VPN → TGW (ECMP)7.5 Gbps

Share DX across accounts:

⚠️ Exam trap: “Connect many VPCs + on-premises, simplify topology” → Transit Gateway. NOT VPC Peering (not transitive, mesh complexity).

⚠️ Exam trap: “Increase VPN bandwidth to AWS” → multiple VPN connections + Transit Gateway with ECMP. VGW limited to 1.25 Gbps.


VPC Traffic Mirroring:

Traffic Mirroring — capture and inspect network traffic in your VPC.

Source A (ENI) ──┐
                 ├──► Traffic Mirroring ──► NLB ──► ASG (Security Appliances)
Source B (ENI) ──┘    (filter optional)

Egress-only Internet Gateway:

Egress-only IGW — like a NAT Gateway, but for IPv6.

IPv4 outbound: Private EC2 ──► NAT Gateway ──► IGW ──► Internet
IPv6 outbound: EC2 ──► Egress-only IGW ──► Internet  (no inbound initiated)

⚠️ Exam trap: “IPv6 instances need outbound internet but block inbound” → Egress-only IGW (NOT NAT Gateway — NAT is for IPv4 only).


Networking Costs:

Core principle: Ingress is free, egress costs money. Keep traffic inside AWS to minimize costs.

EC2 Data Transfer Costs (per GB):

Traffic PathCost
Traffic in to EC2 (ingress)Free
Same AZ, private IPFree
Same AZ, public/Elastic IP$0.02
Cross-AZ, private IP$0.01
Cross-region$0.02

Cost optimization tips:

⚠️ Exam trap: “Lowest egress cost” with Direct Connect available

NAT Gateway vs VPC Gateway Endpoint (for S3):

PathCost
EC2 → NAT GW → IGW → S3$0.045/hr + $0.045/GB + $0.09/GB cross-region
EC2 → Gateway Endpoint → S3Free (endpoint) + $0.01/GB same-region
Subnet 1: EC2 ──► NAT GW ──► IGW ──► S3         (costly: ~$0.09/GB)
Subnet 2: EC2 ──► VPC Gateway Endpoint ──► S3    (free endpoint, ~$0.01/GB)

⚠️ Exam trap: “Reduce cost of S3 access from VPC” → Gateway Endpoint (free, no NAT GW charges). Route table entry with pl-id for Amazon S3 → vpce-id.

S3 Data Transfer Pricing (USA):

PathCost/GB
S3 ingress (upload)Free
S3 → Internet$0.09
S3 Transfer Acceleration+$0.04 to $0.08 on top
S3 → CloudFrontFree
CloudFront → Internet$0.085 (slightly cheaper than S3 direct)
S3 Cross-Region Replication$0.02

⚠️ Exam trap: “Deliver S3 content to users cheaply” → CloudFront ($0.085/GB vs $0.09/GB direct) + caching + 7x cheaper S3 request pricing.


AWS Network Firewall:

AWS Network Firewall — protect your entire VPC, Layer 3 to Layer 7.

Fine-Grained Controls:

⚠️ Exam trap: “Sophisticated VPC-wide network protection, Layer 3-7, inspect all traffic directions” → AWS Network Firewall. NOT just NACLs/SGs (those are basic). NOT WAF (WAF is Layer 7 HTTP only).



🎯 MASTER SUMMARY: VPC & Networking Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: Everything Starts with Routing

Traffic doesn’t flow just because resources exist — Route Tables are the backbone. IGW, NAT, VPC Peering, Endpoints — none work without correct route table entries. If connectivity fails, check routes first.

Key insight: Most “can’t connect” troubleshooting answers involve Route Tables, SGs, or NACLs.

Principle 2: Public vs Private = Route to IGW

A subnet is “public” only because its route table has 0.0.0.0/0 → igw-id. There’s no checkbox. No route to IGW = private subnet, regardless of what you call it.

Principle 3: Stateful vs Stateless = The Fundamental Security Split

Derivation: If the exam says “ACCEPT then REJECT in flow logs” → always NACL. SGs never block return traffic.

Principle 4: Private Subnet Internet Access = NAT (IPv4) or Egress-only IGW (IPv6)

Private instances can’t reach internet directly. They need a “translator” in a public subnet:

Principle 5: AWS Services from VPC = Endpoints (Stay Private)

Instead of routing through IGW to reach AWS services, use VPC Endpoints:

Derivation: “Reduce cost of S3 access” or “private access to S3” → Gateway Endpoint.

Principle 6: On-Premises Connectivity = Speed vs Cost vs Time

Three options, each with trade-offs:

Principle 7: Transitivity Doesn’t Exist in VPC Peering

VPC Peering is point-to-point. A↔B and B↔C does NOT mean A↔C. For hub connectivity → Transit Gateway.

Principle 8: Transit Gateway = The Universal Hub

TGW solves three problems: (1) transitive routing, (2) VPN bandwidth scaling (ECMP), (3) sharing DX across accounts. If question mentions “many VPCs” or “simplify network” → TGW.

Principle 9: Network Protection is Layered

Layer 3-4:  NACLs (subnet) → Security Groups (ENI)
Layer 7:    WAF (HTTP/HTTPS only)
Layer 3-7:  AWS Network Firewall (entire VPC, all directions)
Cross-acct: AWS Firewall Manager (centralize rules)

Principle 10: Egress Costs Money, Ingress is Free

All AWS networking pricing follows this: data IN = free, data OUT = costs. Minimize egress by keeping processing inside AWS and using private IPs.


Part 2: Decision Tree (Follow Keywords → Find Answer)

Connectivity Decision Tree

Need to connect to AWS?
│
├─ From on-premises SITE?
│  ├─ Need it NOW (days)? ──► Site-to-Site VPN
│  ├─ Need dedicated/private/consistent? ──► Direct Connect
│  ├─ Need encryption on DX? ──► VPN on top of DX
│  ├─ Multiple sites to connect? ──► VPN CloudHub
│  └─ Need DX to multiple regions? ──► DX Gateway
│
├─ From individual LAPTOP?
│  └─► AWS Client VPN (OpenVPN)
│
├─ VPC to VPC?
│  ├─ Just 2 VPCs? ──► VPC Peering
│  ├─ Many VPCs (hub-and-spoke)? ──► Transit Gateway
│  └─ Expose specific service? ──► PrivateLink (NLB + ENI)
│
└─ VPC to AWS Service (S3, DynamoDB, etc.)?
   ├─ S3 or DynamoDB? ──► Gateway Endpoint (free)
   ├─ Other service? ──► Interface Endpoint
   └─ Need on-prem access too? ──► Interface Endpoint

Security Decision Tree

Need to control traffic?
│
├─ At instance/ENI level? ──► Security Group (ALLOW only, stateful)
├─ At subnet level? ──► NACL (ALLOW + DENY, stateless)
├─ Block specific IPs (Layer 3)? ──► NACL (has DENY rules)
├─ Block HTTP patterns/SQL injection? ──► WAF (Layer 7)
├─ VPC-wide, all directions, L3-L7? ──► AWS Network Firewall
└─ Centralize across accounts? ──► AWS Firewall Manager

The CANNOT List

You CANNOT…Why
Disable IPv4 in VPCVPC requires IPv4; IPv6 is optional dual-stack
Use NAT Gateway as BastionNAT GW doesn’t support SSH — use NAT Instance
Attach >1 IGW per VPC1:1 mapping only
Use Gateway Endpoint from on-premGateway Endpoint = route table only; use Interface Endpoint
Make VPC Peering transitiveNeed separate peering per pair, or use TGW
Encrypt DX nativelyAdd VPN on top for encryption
Set up DX in under 1 monthLead time >1 month; use VPN for quick setup
Have VPC CIDR larger than /16Max VPC size = /16 (65,536 IPs)
Attach SG to NAT GatewayNAT GW has no SGs — only NAT Instance does
Use VGW for >1.25 Gbps VPNNeed TGW + ECMP for higher throughput

Part 3: Scenario Pattern Recognition

Pattern: “Private instances need internet access, managed, scalable”

Keywords: private subnet, internet, managed, scales Answer: NAT Gateway Why: AWS-managed, auto-scales to 100 Gbps, no SG/patching needed


Pattern: “SSH into private EC2 instances”

Keywords: SSH, private subnet, access, developers Answer: Bastion Host (or SSM Session Manager) Why: Bastion in public subnet acts as SSH jump box. SG: port 22 from corporate public CIDR


Pattern: “ACCEPT then REJECT in flow logs”

Keywords: flow logs, allowed then blocked, return traffic Answer: NACL is blocking (not SG) Why: SGs are stateful — they never block return traffic. Only NACLs (stateless) do this


Pattern: “Can’t launch EC2 in subnet”

Keywords: launch failure, subnet, no capacity Answer: No available IPv4 addresses → add new CIDR Why: IPv6 space is huge; the bottleneck is always IPv4


Pattern: “VPC A peers with B, B peers with C, can A reach C?”

Keywords: VPC peering, transitive, multiple VPCs Answer: NO — VPC Peering is not transitive Why: Need A↔C peering, or use Transit Gateway


Pattern: “Connect many VPCs + on-premises, simplify”

Keywords: many VPCs, hub-and-spoke, simplify, on-premises Answer: Transit Gateway Why: Single hub, N connections instead of N² mesh


Pattern: “Increase VPN bandwidth beyond 1.25 Gbps”

Keywords: VPN throughput, scale bandwidth, more than 1.25 Answer: Transit Gateway with ECMP + multiple VPN connections Why: VGW uses only 1 tunnel (1.25 Gbps); TGW uses both (2.5 Gbps) and stacks connections


Pattern: “Private, dedicated, consistent connection to AWS”

Keywords: dedicated, private, consistent, not internet Answer: Direct Connect Why: Physical private connection, doesn’t traverse internet


Pattern: “Improve connectivity within days/1 week”

Keywords: quickly, fast setup, days, immediately Answer: Site-to-Site VPN (NOT Direct Connect) Why: DX takes >1 month to establish


Pattern: “DX backup, cost-effective”

Keywords: Direct Connect fails, backup, cheap Answer: Site-to-Site VPN as backup Why: Second DX is expensive; VPN is cheap and quick


Pattern: “Access S3/DynamoDB from VPC privately”

Keywords: S3, DynamoDB, private access, no internet, reduce cost Answer: VPC Gateway Endpoint (free) Why: Free, route table entry, no NAT GW charges


Pattern: “Access AWS service from on-premises via DX/VPN”

Keywords: on-premises, AWS service, private access, VPN, Direct Connect Answer: VPC Interface Endpoint (not Gateway) Why: Gateway Endpoints can’t be accessed from on-prem


Pattern: “Connect multiple on-prem sites, backup over internet”

Keywords: multiple sites, hub-and-spoke, VPN, backup Answer: AWS VPN CloudHub Why: Multiple VPN connections on same VGW, over public internet


Pattern: “Expose service from one VPC to another privately”

Keywords: expose, service, private, cross-VPC, cross-account Answer: AWS PrivateLink (NLB + ENI) Why: No peering, no IGW, no routes needed


Pattern: “IPv6 outbound internet, block inbound”

Keywords: IPv6, outbound, prevent inbound, internet Answer: Egress-only Internet Gateway Why: NAT is IPv4 only; Egress-only IGW is the IPv6 equivalent


Pattern: “Capture IP traffic information/metadata”

Keywords: capture, IP traffic, information, logs, metadata Answer: VPC Flow Logs Why: Flow Logs = metadata (IPs, ports, action). Traffic Mirroring = full packet capture


Pattern: “Inspect actual network traffic content”

Keywords: inspect, deep packet, content, security appliance Answer: VPC Traffic Mirroring Why: Copies actual packets to ENI/NLB for analysis


Pattern: “500 Mbps Direct Connect”

Keywords: 500 Mbps, DX, connection Answer: Hosted connection Why: Dedicated = 1/10/100 Gbps only. Anything in between = Hosted


Pattern: “VPC-wide network protection, Layer 3-7”

Keywords: sophisticated, entire VPC, Layer 3-7, all directions Answer: AWS Network Firewall Why: NACLs/SGs are basic, WAF is HTTP-only. Network Firewall covers L3-L7 in all directions


Pattern: “DX to VPCs in multiple regions”

Keywords: Direct Connect, multiple regions, VPCs Answer: Direct Connect Gateway Why: One DX → DX Gateway → VPCs across regions


Part 4: Quick Reference Tables

On-Premises Connectivity Comparison:

FeatureSite-to-Site VPNDirect ConnectClient VPN
SpeedUp to 1.25 Gbps (VGW) or more (TGW+ECMP)50 Mbps – 100 GbpsN/A
PathPublic internetPrivate physical linePublic internet
Encrypted✅ Yes (IPsec)❌ No (add VPN on top)✅ Yes (OpenVPN)
Setup timeMinutes/hours>1 monthMinutes
CostLowHighLow
Use caseQuick setup, backup for DXLarge bandwidth, consistentIndividual users
AWS sideVGW or TGWVGW + DX LocationClient VPN Endpoint
On-prem sideCGWCustomer RouterOpenVPN client

VPC Endpoint Comparison:

FeatureGateway EndpointInterface Endpoint
ServicesS3, DynamoDBEverything else
CostFree$/hr + $/GB
HowRoute Table entryENI (private IP)
SGNoYes
On-prem access

Security Layers:

LayerToolScopeRules
Instance/ENISecurity GroupPer ENIAllow only, stateful
SubnetNACLPer subnetAllow + Deny, stateless
HTTP/HTTPSWAFCloudFront/ALB/API GWWeb ACL rules
Entire VPC (L3-L7)Network FirewallPer VPCAllow/Drop/Alert
Cross-accountFirewall ManagerOrganizationCentralized management

Key Numbers:

WhatValue
Max VPCs per region5 (soft limit)
Max CIDRs per VPC5
VPC CIDR range/28 (16 IPs) – /16 (65,536 IPs)
Reserved IPs per subnet5
VPN throughput (VGW)1.25 Gbps
VPN throughput (TGW, 1 conn)2.5 Gbps (ECMP)
NAT Gateway bandwidthUp to 100 Gbps
DX Dedicated speeds1 / 10 / 100 Gbps
DX Hosted speeds50 Mbps – 10 Gbps
DX setup time>1 month
Ephemeral ports (Linux)32768 – 60999
Ephemeral ports (Windows)49152 – 65535

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“Private subnet internet IPv4, managed”NAT Gateway
“NAT + Security Groups”NAT Instance
“NAT + Bastion Host”NAT Instance
“SSH into private EC2”Bastion Host (or SSM)
“Bastion Host SG, which port/CIDR?”Port 22, company public CIDR
“Default NACL behavior”Allow ALL traffic
“Custom NACL behavior”Deny ALL traffic
“ACCEPT then REJECT in flow logs”NACL blocking (not SG)
“Return traffic blocked”NACL (stateless)
“Ephemeral ports”NACL outbound/inbound rules needed
“Top-10 IP addresses in flow logs”CloudWatch Contributor Insights
“Analyze flow logs with SQL”S3 + Athena
“VPC Peering transitive?”NO — need TGW
“Route tables updated one side only”Update BOTH VPCs
“S3/DynamoDB private access from VPC”Gateway Endpoint (free)
“AWS service access from on-prem”Interface Endpoint
“Lambda can’t reach DynamoDB”VPC Gateway Endpoint
“Expose service cross-VPC privately”PrivateLink (NLB + ENI)
“Ping EC2 from on-prem fails”ICMP in SG + Route Propagation
“Multiple on-prem sites, VPN backup”VPN CloudHub
“Private, dedicated, consistent connection”Direct Connect
“Encrypted connection over internet”Site-to-Site VPN
“Improve connection in days/1 week”VPN (NOT DX — >1 month)
“DX backup, cost-effective”Site-to-Site VPN
“500 Mbps DX connection”Hosted (Dedicated = 1/10/100 only)
“DX to multiple regions”DX Gateway
“Share DX across accounts”TGW + DX GW + Transit VIF + RAM
“VPN bandwidth >1.25 Gbps”TGW + ECMP
“Many VPCs + on-prem, simplify”Transit Gateway
“IP Multicast”Transit Gateway
“IPv6 outbound, block inbound”Egress-only IGW
“Can’t launch EC2 in subnet”IPv4 exhausted → new CIDR
“Reduce S3 access cost from VPC”Gateway Endpoint (free)
“Capture IP traffic metadata”VPC Flow Logs
“Deep packet inspection”VPC Traffic Mirroring
“VPC-wide L3-L7 protection”AWS Network Firewall
“Centralize firewall rules cross-account”AWS Firewall Manager
“ALB → EC2 SG, most secure”Reference ALB’s SG (not CIDR)

Part 6: Elimination Checklist

Connectivity Questions

□ Is it on-premises → AWS?
  → Yes: VPN, DX, or Client VPN
    □ Need it fast (days)?
      → Yes = Site-to-Site VPN
      → No (can wait months) = Direct Connect
    □ Individual user (laptop)?
      → Yes = Client VPN
    □ Multiple on-prem sites?
      → Yes = VPN CloudHub
  → No: VPC-to-VPC or VPC-to-service

□ Is it VPC → VPC?
  → 2 VPCs = VPC Peering
  → Many VPCs = Transit Gateway
  → Expose single service = PrivateLink

□ Is it VPC → AWS Service?
  → S3 or DynamoDB = Gateway Endpoint
  → Anything else = Interface Endpoint
  → Needs on-prem access = Interface Endpoint

Security Questions

□ What layer?
  → L3-L4 per instance = Security Group
  → L3-L4 per subnet = NACL
  → L7 HTTP only = WAF
  → L3-L7 entire VPC = Network Firewall

□ Need DENY rules?
  → Yes = NACL (SGs only have ALLOW)

□ Stateful or stateless matters?
  → "Return traffic blocked" = NACL (stateless)
  → "Ephemeral ports needed" = NACL

Cost Questions

□ Private IP or Public IP?
  → Private = cheaper (free same-AZ, $0.01 cross-AZ)
  → Public = $0.02 even same-AZ

□ S3 access path?
  → NAT GW → IGW = expensive
  → Gateway Endpoint = free

□ Content delivery?
  → S3 direct = $0.09/GB
  → CloudFront = $0.085/GB + caching

🏆 The Golden Rules

  1. Route Tables are the backbone (no routes = no connectivity, regardless of gateways)
  2. SG = stateful, NACL = stateless (derive all firewall behavior from this)
  3. “ACCEPT then REJECT” = always NACL (SGs never block return traffic)
  4. Gateway Endpoint for S3/DynamoDB (free, always preferred on exam)
  5. VPC Peering is NOT transitive (many VPCs → Transit Gateway)
  6. DX takes >1 month (quick fix → VPN, DX backup → VPN)
  7. Dedicated DX = 1/10/100 Gbps (anything else → Hosted)
  8. NAT Gateway has NO Security Groups (SGs/Bastion → NAT Instance)
  9. VPN max 1.25 Gbps via VGW (more → TGW + ECMP)
  10. IPv4 outbound = NAT, IPv6 outbound = Egress-only IGW (don’t mix them)
  11. 5 IPs reserved per subnet (always add 5 to your requirement)
  12. Private IP = cheaper + better performance (always prefer over public)
  13. Ingress = free, egress = costs money (keep processing inside AWS)
  14. Reference SGs in rules (more secure and dynamic than CIDR-based rules)
  15. DX is NOT encrypted (add VPN on top for encryption)

Amazon Route53:

┌──────────┐   example.com?   ┌─────────────┐
│  Client  │ ───────────────→ │  Route 53   │
│          │ ←─────────────── │             │
└────┬─────┘   54.22.33.44    └─────────────┘
     │
     │  54.22.33.44
     ▼
┌─────────────────────────────────────┐
│            AWS Cloud                │
│     ┌──────────────────────┐        │
│     │    EC2 Instance      │        │
│     │  Public IP:          │        │
│     │  54.22.33.44         │        │
│     └──────────────────────┘        │
└─────────────────────────────────────┘

AWS Route53 is a managed DNS (Domain Name System), collection of rules and records which helps clients understand how to reach a server through URLs.

FeatureDetails
TypeHighly available, scalable, fully managed Authoritative DNS
AuthoritativeYou (customer) can update DNS records
Domain RegistrarYes — can register domains directly
Health ChecksMonitor health of your resources
SLA100% availability (only AWS service with this!)
ScopeGlobal service (not regional)
Why “53”?Traditional DNS port number

⚠️ Exam trap: Route 53 is a global service — no region selection needed!

DNS Terminologies:

        http://api.www.example.com.
               │   │       │     │ │
               │   │       │     │ └── Root (.)
               │   │       │     └──── TLD (.com, .gov, .org)
               │   │       └────────── SLD (example.com)
               │   └────────────────── Sub Domain (www)
               └────────────────────── Sub Domain (api)
               
        └────────────────────────────┘
            FQDN (Fully Qualified Domain Name)
TermDescription
Domain RegistrarAmazon Route 53, GoDaddy, etc.
DNS RecordsA, AAAA, CNAME, NS, etc.
Zone FileContains DNS records
Name ServerResolves DNS queries (Authoritative or Non-Authoritative)
TLD.com, .us, .gov, .org
SLDamazon.com, google.com

DNS Resolution Flow:

┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐
│ Browser │───→│   OS    │───→│   ISP   │───→│  Root   │───→│   TLD   │───→│  Name   │
│  Cache  │    │  Cache  │    │  Cache  │    │ Server  │    │ Server  │    │ Server  │
└─────────┘    └─────────┘    └─────────┘    └─────────┘    └─────────┘    └────┬────┘
                                                                                │
                                              ┌─────────────────────────────────┘
                                              ▼
                                        IP Address
                                     (cached on way back)
  1. Browser cache → OS cache → ISP DNS Resolver
  2. Root Server → “Go ask .com TLD”
  3. TLD Server → “Go ask example.com Name Server”
  4. Name Server → Returns IP address
  5. Caches populated on the way back

Route 53 – Records:

Each record contains:

FieldDescription
Domain/subdomain Namee.g., example.com
Record TypeA, AAAA, CNAME, NS, etc.
Valuee.g., 12.34.56.78
Routing PolicyHow Route 53 responds to queries
TTLTime record is cached at DNS Resolvers
Record TypeMust KnowDescription
AMaps hostname to IPv4
AAAAMaps hostname to IPv6
CNAMEMaps hostname → another hostname (target must have A/AAAA)
NSName Servers for the Hosted Zone — controls traffic routing
CAA, DS, MX, PTR, SOA, TXT, SPF, SRVAdvancedLess common record types

⚠️ Exam trap: CNAME cannot be used for Zone Apex (example.com) — only for subdomains (www.example.com). Use Alias for apex!

Route 53 – Hosted Zones:

A container for records that define how to route traffic to a domain and its subdomains.

TypeAccessExampleUse Case
PublicInternetexample.com → 54.22.33.44S3, CloudFront, EC2 (Public IP), ALB
PrivateWithin VPC(s)api.example.internal → 10.0.0.10Internal EC2, RDS, microservices
    PUBLIC HOSTED ZONE                    PRIVATE HOSTED ZONE
    ──────────────────                    ───────────────────
         
    ┌────────┐  example.com?              ┌─────────────────────────────────┐
    │ Client │ ──────────────┐            │              VPC                │
    └────────┘               │            │  ┌─────────────────────────┐    │
         ▲                   ▼            │  │  Private Hosted Zone    │    │
         │            ┌────────────┐      │  └───────────┬─────────────┘    │
         │            │  Public    │      │              │                  │
         └────────────│  Hosted    │      │   ┌──────────┴──────────┐       │
       54.22.33.44    │  Zone      │      │   ▼                     ▼       │
                      └─────┬──────┘      │ api.example     db.example      │
                            │             │ .internal?      .internal?      │
                            ▼             │   │                     │       │
                    ┌───────────────┐     │   ▼                     ▼       │
                    │ S3, CloudFront│     │ 10.0.0.10          10.0.0.35    │
                    │ EC2, ALB      │     │ (EC2)              (RDS)        │
                    └───────────────┘     └─────────────────────────────────┘

Route 53 – TTL (Time To Live):

              myapp.example.com? 
┌────────┐ ─────────────────────→ ┌──────────┐
│ Client │ ←───────────────────── │ Route 53 │
└───┬────┘                        └──────────┘
    │  A 12.34.56.78 (TTL) 
    │
    │  Client caches result for TTL duration
    │
    │        HTTP Request
    └──────────────────────────→ ┌────────────┐
    ←─────────────────────────── │ Web Server │
             HTTP Response       └────────────┘
TTLTraffic to Route 53Record FreshnessCostUse Case
High (24 hr)LessPossibly outdatedLowerStable records
Low (60 sec)MoreAlways freshHigher $$Before migrations/changes

⚠️ Exam trap: Changed DNS record but users still go to old IP? → TTL caching! Clients cache until TTL expires.

Route 53 – CNAME vs Alias:

AWS resources expose ugly hostnames (e.g., lb1-1234.us-east-2.elb.amazonaws.com) — you want myapp.mydomain.com

FeatureCNAMEAlias
Points toAny hostnameAWS resources only
Zone Apex (root domain)❌ NO✅ YES
CostStandard DNS chargesFree
Health Check✅ Native
TTLYou set itAuto-managed by Route 53
Record TypeCNAMEA or AAAA

⚠️ Exam trap: Need to point mydomain.com (root) to an ALB? → Alias (CNAME won’t work!)

Route 53 – Alias Records:

AWS extension to DNS that maps a hostname to an AWS resource. Automatically recognizes IP changes on the target.

┌───────────────────────────────────────────────┐
│  Route 53 Alias Record                        │
│  ┌────────────────────────────────────────┐   │
│  │ Record: example.com                    │   │
│  │ Type: A                                │   │
│  │ Value: MyALB-123456789.us-east-1...    │   │
│  └────────────────────────────────────────┘   │
└───────────┬───────────────────────────────────┘
            │ AWS-Managed (IP changes tracked)
            ▼
    ┌──────────────────┐
    │ Application      │
    │ Load Balancer    │
    │ (MyALB-1234...)  │
    └──────────────────┘
CharacteristicDetail
Works at Zone Apex✅ YES (example.com)
CostFree (unlike CNAME)
Health Checks✅ Native support
TTL❌ Not settable (auto-managed)
Record TypeA or AAAA only
Auto IP tracking✅ YES (AWS manages)

⚠️ Exam trap: Alias records — you cannot set TTL (Route 53 manages it automatically)

Targets:

⚠️ Exam trap: Cannot use Alias for EC2 DNS names — use regular A record or CNAME instead!

Route 53 – Routing Policies:

DNS responds to queries (does NOT route traffic like a load balancer).

Route53 policies:

PolicyUse CaseKey Feature
SimpleSingle resourceRandomly chosen if multiple values
WeightedLoad balancingControl traffic % distribution
FailoverActive-passive HAPrimary + standby resource
LatencyMulti-regionRoutes to lowest latency region
GeolocationLocation-basedRoute by user geography
GeoproximityResource location biasRoute by resource location + bias
MultivalueMultiple IPsUp to 8 random healthy records
IP-basedClient IP routingRoute by CIDR blocks

⚠️ Exam trap: “Routing” in Route 53 ≠ Load Balancer routing. DNS responds to queries; it doesn’t route actual traffic!

Routing Policy – Simple:

SINGLE VALUE                    MULTIPLE VALUES
─────────────                   ──────────────────

    foo.example.com                foo.example.com
         │                              │
         │ A 11.22.33.44               │ A 11.22.33.44
┌─────────────────┐            ┌─────────────────────┐
│  Client         │            │  Client chooses     │
│  Gets 1 value   │            │  a random value     │
└─────────────────┘            └─────────────────────┘
                                   │ A 55.66.77.88
                                   │ A 99.11.22.33

⚠️ Exam trap: Simple policy with multiple values ≠ load balancing! No health checks, no failover.

Routing Policy – Weighted:

⚠️ Exam trap: Weighted ≠ round-robin! It’s percentage-based distribution, not sequential rotation. ⚠️ Exam trap: Weight = 0 stops all traffic to that resource (useful for maintenance)

Routing Policy – Latency-based:

⚠️ Exam trap: Latency ≠ Geography! German user may be directed to US if that has lowest latency. ⚠️ Exam trap: “Best user experience” / “minimize response time” → Latency, not Geolocation!

Route 53 – Health Checks:

Multi-region failover architecture:

                    ┌───────────┐
                    │ Route 53  │
                    │ DNS Record│
                    └─────┬─────┘
                          │
            ┌─────────────┴─────────────┐
            ▼                           ▼
       ❤ Health Check              ❤ Health Check
            │                           │
   ┌────────┴────────┐         ┌────────┴────────┐
   │    us-east-1    │         │    eu-west-1    │
   │  ┌───────────┐  │         │  ┌───────────┐  │
   │  │    ALB    │  │         │  │    ALB    │  │
   │  └─────┬─────┘  │         │  └─────┬─────┘  │
   │        ▼        │         │        ▼        │
   │  ┌───────────┐  │         │  ┌───────────┐  │
   │  │    ASG    │  │         │  │    ASG    │  │
   │  └───────────┘  │         │  └───────────┘  │
   └─────────────────┘         └─────────────────┘

How endpoint monitoring works:

  ❤ Health Checker   ❤ Health Checker   ❤ Health Checker
     (us-east-1)        (us-west-1)        (sa-east-1)
          │                  │                  │
          └──────────────────┼──────────────────┘
                             │ HTTP request to /health
                             ▼ 200 code
                    ┌─────────────────┐
                    │    eu-west-1    │
                    │  ┌───────────┐  │
                    │  │    ALB    │──┼── Must allow Route 53
                    │  └─────┬─────┘  │   Health Checker IPs!
                    │        ▼        │
                    │  ┌───────────┐  │
                    │  │  EC2/ASG  │  │
                    │  └───────────┘  │
                    └─────────────────┘

IP ranges: https://ip-ranges.amazonaws.com/ip-ranges.json

SettingValue
Global health checkers~15
Threshold (healthy/unhealthy)3 (default)
Interval30 sec (10 sec = higher cost)
ProtocolsHTTP, HTTPS, TCP
Healthy if>18% checkers report healthy
Pass codes2xx and 3xx only
Text matchFirst 5120 bytes of response

⚠️ Exam trap: Must configure firewall/security group to allow Route 53 Health Checker IPs!

Health Check TypeWhat It MonitorsUse Case
EndpointApplication, server, AWS resourceDirect resource monitoring
CalculatedOther health checksAggregate multiple checks
CloudWatch AlarmCW Alarms (DynamoDB throttles, RDS, custom)Private resources

⚠️ Exam trap: Only 3 health check types! No direct SQS, SNS, or other service monitoring — use CloudWatch Alarm instead.

⚠️ Exam trap: Private resources → use CloudWatch Alarm health checks (HTTP checks can’t reach them!)

Health Checks for Private Resources:

                              ┌─────────────────────────────────┐
                              │              VPC                │
┌─────────────────┐           │  ┌───────────────────────────┐  │
│ Health Checker  │           │  │     Private subnet        │  │
│  (us-east-1)    │           │  │    ┌─────────────┐        │  │
└────────┬────────┘           │  │    │  EC2 (T2)   │        │  │
         │                    │  │    └──────┬──────┘        │  │
         │ ✖ Can't reach!     │  │           │ monitor       │  │
         │                    │  │           ▼               │  │
         │    monitor         │  │    ┌─────────────┐        │  │
         └────────────────────┼──┼───→│ CloudWatch  │        │  │
                              │  │    │   Alarm     │        │  │
                              │  │    └─────────────┘        │  │
                              │  └───────────────────────────┘  │
                              └─────────────────────────────────┘

Routing Policy – Geolocation:

⚠️ Exam trap: Geolocation ≠ Latency! Geolocation = user’s geography; Latency = network performance. ⚠️ Exam trap: “Legal requirement” / “restrict access by country” → Geolocation (not Latency!)

Routing Policy – Geoproximity:

⚠️ Exam trap: Geoproximity requires Traffic Flow (paid feature). Geolocation does NOT!

Routing Policy – IP-based:

  User B              User A
(200.5.4.100)      (203.0.113.56)
      │                  │
      └────────┬─────────┘
               ▼
          ┌─────────┐
          │Route 53 │
          └────┬────┘
               │
     ┌─────────┴─────────┐
     │  CIDR Collection  │
     ├───────────────────┤
     │ location-1: 203.0.113.0/24 │
     │ location-2: 200.5.4.0/24   │
     └─────────┬─────────┘
               │
     ┌─────────┴─────────┐
     │      Records      │
     ├───────────────────┤
     │ example.com → 1.2.3.4 (location-1) │
     │ example.com → 5.6.7.8 (location-2) │
     └─────────┬─────────┘
               │
       ┌───────┴───────┐
       ▼               ▼
   EC2 (5.6.7.8)   EC2 (1.2.3.4)
    User B →         User A →

Routing Policy – Multi-Value:

⚠️ Exam trap: Multi-Value is NOT a substitute for ELB! It’s client-side selection, not load balancing.

Domain Registrar vs. DNS Service:

ConceptDescription
Domain RegistrarWhere you buy/register domain (GoDaddy, Amazon Registrar, etc.) — annual fee
DNS ServiceWhere you manage DNS records (can be different from registrar!)

⚠️ Exam trap: Update NS records at the registrar (GoDaddy), not in Route 53! And use Public Hosted Zone for internet-facing domains.

Route 53 – Hybrid DNS:

Route 53 Resolver automatically answers DNS queries for:

Hybrid DNS = Resolving DNS queries between VPC (Route 53 Resolver) and your networks (other DNS Resolvers)

Network TypeConnection
VPC / Peered VPCNative
On-premisesDirect Connect or AWS VPN

Route 53 – Resolver Endpoints:

Inbound Endpoint — On-premises DNS resolvers can query Route 53 Resolver for AWS resources

                                    ┌─────────────────────────────────────────┐
                                    │               us-east-1                 │
   On-Premises Data Center          │  ┌───────────────────────────────────┐  │
  ┌──────────────────────┐          │  │              VPC                  │  │
  │                      │          │  │     Private Hosted Zone           │  │
  │  ┌────────────────┐  │          │  │       (aws.private)               │  │
  │  │ DNS Resolvers  │  │          │  │  ┌─────────────────────────────┐  │  │
  │  │(onpremise.     │  │          │  │  │     Private Subnet          │  │  │
  │  │  private)      │──┼── DNS Query: app.aws.private? ──────────────→│  │  │
  │  └────────────────┘  │          │  │  │  ┌────────────┐  ┌────────┐ │  │  │
  │         ▲            │          │  │  │  │    EC2     │  │Resolver│ │  │  │
  │         │            │          │  │  │  │(app.aws.   │←─│Inbound │ │  │  │
  │  ┌──────┴───────┐    │          │  │  │  │  private)  │  │Endpoint│ │  │  │
  │  │    Server    │    │          │  │  │  └────────────┘  └───┬────┘ │  │  │
  │  │ (web.onprem  │    │◀═══VPN or DX═══════════════════════════╝     │  │  │
  │  │  .private)   │    │          │  │  └─────────────────────────────┘  │  │
  │  └──────────────┘    │          │  └──────────────┬────────────────────┘  │
  └──────────────────────┘          │                 │ lookup               │
                                    │                 ▼                      │
                                    │           Route 53 Resolver            │
                                    └─────────────────────────────────────────┘
EndpointDirectionUse Case
InboundOn-prem → AWSOn-prem resolves AWS Private Hosted Zone records
OutboundAWS → On-premAWS resources resolve on-premises DNS records

⚠️ Exam trap: Inbound = queries coming IN to AWS. Outbound = queries going OUT from AWS. Think from AWS perspective!

Outbound Endpoint — Route 53 Resolver forwards DNS queries to on-premises DNS Resolvers

                                    ┌─────────────────────────────────────────┐
                                    │               us-east-1                 │
   On-Premises Data Center          │  ┌───────────────────────────────────┐  │
  ┌──────────────────────┐          │  │              VPC                  │  │
  │                      │          │  │     Private Hosted Zone           │  │
  │  ┌────────────────┐  │          │  │       (aws.private)               │  │
  │  │ DNS Resolvers  │  │          │  │  ┌─────────────────────────────┐  │  │
  │  │(onpremise.     │←─┼── DNS Query: web.onpremise.private? ────────│  │  │
  │  │  private)      │  │          │  │  │  ┌────────────┐  ┌────────┐ │  │  │
  │  └────────────────┘  │          │  │  │  │    EC2     │─→│Resolver│ │  │  │
  │         │            │          │  │  │  │(app.aws.   │  │Outbound│ │  │  │
  │         ▼            │          │  │  │  │  private)  │  │Endpoint│─┼──┼──┘
  │  ┌──────────────┐    │          │  │  │  └────────────┘  └────────┘ │  │
  │  │    Server    │    │◀═══VPN or DX═════════════════════════════════╝  │
  │  │ (web.onprem  │    │          │  │  └─────────────────────────────┘  │
  │  │  .private)   │    │          │  └──────────────┬────────────────────┘
  │  └──────────────┘    │          │                 │                    │
  └──────────────────────┘          │                 ▼                    │
                                    │           Route 53 Resolver          │
                                    └──────────────────────────────────────┘

Route 53 – Resolver Rules

Resolver Rules = define how DNS queries are forwarded from Outbound Endpoints

Rule TypeDescription
Conditional ForwardingForward queries for specific domains to target DNS servers
SystemDefault rules (auto-created for Private Hosted Zones, VPC DNS)
RecursiveForward all unmatched queries to Route 53 Resolver
Resolver Rules Example:

Query: db.corp.local           Query: api.example.com
         │                              │
         ▼                              ▼
    ┌──────────────────────────────────────┐
    │         Resolver Rules               │
    │  ┌────────────────────────────────┐  │
    │  │ *.corp.local → 10.0.0.53       │──┼──▶ On-prem DNS
    │  │ *.example.com → System Rule    │──┼──▶ Route 53
    │  │ * (default) → Recursive        │──┼──▶ Public DNS
    │  └────────────────────────────────┘  │
    └──────────────────────────────────────┘

⚠️ Exam trap: “Share DNS resolution across accounts” → Resolver Rules + AWS RAM


Route 53 – DNSSEC

DNSSEC = DNS Security Extensions — protects against DNS spoofing/cache poisoning

FeatureDetails
PurposeCryptographically sign DNS records to verify authenticity
Route 53 support✅ DNSSEC signing for public hosted zones
How it worksUses KMS to manage keys (KSK), Route 53 manages ZSK
Chain of trustRoot → TLD → Your domain (DS records link them)

Setup steps:

  1. Enable DNSSEC signing in Route 53
  2. Create KSK (Key Signing Key) in KMS — must be in us-east-1
  3. Establish chain of trust with parent zone (add DS record at registrar)

⚠️ Exam trap: “Prevent DNS spoofing” or “verify DNS response authenticity” → DNSSEC ⚠️ Exam trap: DNSSEC KMS key must be in us-east-1 (like CloudFront certificates)



🎯 MASTER SUMMARY: Route 53 Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: DNS ≠ Load Balancer

Route 53 responds to DNS queries — it returns IP addresses. It does NOT route actual network traffic.

Principle 2: Alias = AWS’s DNS Superpower

CNAME has limitations (can’t use at Zone Apex, costs money). AWS invented Alias to solve this:

Rule: If pointing to AWS resource → use Alias. If pointing to non-AWS → use CNAME (or A record).

Principle 3: Zone Apex = Root Domain Problem

example.com (no subdomain) = Zone Apex = Root Domain

Record TypeZone Apex?Example
CNAME❌ NOCannot use for example.com
Alias✅ YESCan use for example.com
A Record✅ YESCan use for example.com

DNS standard forbids CNAME at apex. AWS Alias bypasses this limitation.

Principle 4: Health Checks = Failover Enabler

Health checks are the foundation of high availability in Route 53:

Principle 5: TTL = Caching Control

TTL determines how long clients cache DNS responses:

Principle 6: Latency ≠ Geography

Two commonly confused policies:

German user might be routed to US-East if that has lower latency than EU-West.

Principle 7: Resource Policies for Cross-Service/Cross-Account

When another AWS service or another account needs access:

Principle 8: Hybrid DNS = Inbound + Outbound

Think from AWS’s perspective:


Part 2: Decision Tree (Follow Keywords → Find Answer)

Step 1: What type of record do you need?

                    What are you pointing to?
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
   IPv4 Address         AWS Resource          Another Hostname
        │                     │                     │
        ▼                     ▼                     ▼
    A Record              Alias A/AAAA          Is it Zone Apex?
                          (FREE!)                   │
                                            ┌───────┴───────┐
                                            ▼               ▼
                                           Yes              No
                                            │               │
                                            ▼               ▼
                                         Alias           CNAME ok
                                       (required)

Step 2: Which routing policy?

                        What's the requirement?
                              │
    ┌────────────┬────────────┼────────────┬────────────┬────────────┐
    ▼            ▼            ▼            ▼            ▼            ▼
 Single      Traffic %    Best User    User's      Failover     Client IP
 Resource    Control      Experience   Country     HA Setup     Routing
    │            │            │            │            │            │
    ▼            ▼            ▼            ▼            ▼            ▼
 Simple      Weighted     Latency    Geolocation  Failover     IP-based

Step 3: Feature-Based Decision Table

If question mentions…Answer is…
“Zone Apex” / “root domain” + AWS resourceAlias record
“example.com” (not www.) + ALB/CloudFrontAlias record
“free DNS queries”Alias record
“minimize response time” / “best user experience”Latency routing
“legal requirement” / “restrict by country”Geolocation routing
“content localization by region”Geolocation routing
“A/B testing” / “canary deployment”Weighted routing
“traffic percentage” / “gradual migration”Weighted routing
“active-passive” / “disaster recovery”Failover routing
“primary and secondary”Failover routing
“shift traffic between locations” / “bias”Geoproximity routing
“on-premises resolves AWS domains”Inbound Resolver Endpoint
“AWS resolves on-premises domains”Outbound Resolver Endpoint
“DNS spoofing” / “verify authenticity”DNSSEC
“private resource health check”CloudWatch Alarm health check
“share DNS rules across accounts”Resolver Rules + AWS RAM
“users still see old IP after change”TTL caching issue

The “NOT” Rules (Eliminate Wrong Answers Fast)

StatementWhy It’s Wrong
CNAME for Zone ApexCNAME cannot be used at root domain
Alias for EC2 DNS nameAlias doesn’t support EC2 DNS — use A/CNAME
Alias with custom TTLAlias TTL is auto-managed, cannot be set
Health check for private EC2Health checkers can’t reach private subnets
Simple policy with health checkSimple routing doesn’t support health checks
Geolocation for best performanceGeolocation = geography, not network performance
Multi-Value replaces ELBMulti-Value is DNS-level, not true load balancing
Route 53 routes trafficRoute 53 answers DNS queries, doesn’t route traffic

The “CANNOT” List

Cannot…Instead…
Use CNAME at Zone ApexUse Alias
Set TTL on Alias recordsTTL is auto-managed
Create Alias to EC2 DNS nameUse A record or CNAME
Health check private resources directlyUse CloudWatch Alarm
Use Geoproximity without Traffic FlowTraffic Flow is required
Have health checks with Simple policyUse Weighted/Failover/Multi-Value

Part 3: Scenario Pattern Recognition

Pattern: “Point root domain to AWS resource”

Keywords: example.com (no www), Zone Apex, ALB, CloudFront, root domain

Answer: Alias record (A type)

Why: CNAME cannot be used at Zone Apex. Alias can.


Pattern: “Minimize response time / Best user experience”

Keywords: lowest latency, best performance, fastest response, multi-region app

Answer: Latency-based routing

Why: Routes to AWS region with best network performance, regardless of geography.


Keywords: country restrictions, content localization, legal requirement, GDPR

Answer: Geolocation routing

Why: Routes based on user’s physical location, not network performance.


Pattern: “Gradual migration / Canary deployment”

Keywords: A/B testing, percentage of traffic, gradual rollout, 10% to new version

Answer: Weighted routing

Why: Control exact percentage of traffic to each resource.


Pattern: “Active-passive / Disaster recovery”

Keywords: primary and secondary, failover, standby, DR site

Answer: Failover routing policy + Health checks

Why: Automatically switches to secondary when primary fails health check.


Pattern: “On-premises needs to resolve AWS private domains”

Keywords: hybrid cloud, on-premises DNS, resolve Private Hosted Zone from datacenter

Answer: Inbound Resolver Endpoint

Why: Allows on-prem DNS servers to query Route 53 for AWS resources.


Pattern: “AWS resources need to resolve on-premises domains”

Keywords: EC2 needs to reach on-prem by hostname, resolve corp.local from VPC

Answer: Outbound Resolver Endpoint + Forwarding Rules

Why: Forwards DNS queries from VPC to on-premises DNS servers.


Pattern: “Users still seeing old IP after DNS change”

Keywords: DNS not updating, old IP, change not propagating

Answer: TTL caching issue

Solution: Wait for TTL to expire, or lower TTL before making changes.


Pattern: “Health check for private/internal resource”

Keywords: private subnet, internal EC2, RDS health, can’t reach from internet

Answer: CloudWatch Alarm-based health check

Why: Route 53 health checkers are public — can’t reach private resources.


Pattern: “Prevent DNS spoofing / Verify DNS authenticity”

Keywords: DNS security, cache poisoning, MITM, verify DNS response

Answer: DNSSEC

Remember: KMS key must be in us-east-1.


Pattern: “Share DNS configuration across accounts”

Keywords: multi-account, centralized DNS, share resolver rules

Answer: Resolver Rules + AWS RAM

Why: Resolver Rules can be shared across accounts via Resource Access Manager.


Pattern: “Buy domain elsewhere, use Route 53 for DNS”

Keywords: GoDaddy, third-party registrar, use Route 53

Answer: Create Public Hosted Zone → Update NS records at the registrar

Why: NS records tell the internet where to find your DNS. Update at registrar, not Route 53.


Part 4: Quick Reference Tables

Routing Policy Comparison

PolicyHealth Check?Use CaseKey Feature
Simple❌ NoSingle resourceReturns all values, client picks
Weighted✅ YesA/B testing, migrationTraffic % control
Failover✅ Yes (required)DR, active-passivePrimary + secondary
Latency✅ YesMulti-region appsBest network performance
Geolocation✅ YesCountry restrictionsUser’s physical location
Geoproximity✅ YesShift traffic by locationBias values (-99 to +99)
Multi-Value✅ YesMultiple healthy IPsUp to 8 healthy records
IP-based✅ YesRoute by client CIDRClient IP → location mapping

Record Type Quick Reference

RecordMaps ToZone Apex?AWS Extension?
AIPv4✅ YesNo
AAAAIPv6✅ YesNo
CNAMEHostname❌ NoNo
AliasAWS Resource✅ Yes✅ Yes (AWS-only)
NSName Servers✅ YesNo

Alias Targets (What Can Alias Point To?)

✅ Can Alias To❌ Cannot Alias To
ALB, NLB, Classic LBEC2 DNS name
CloudFront DistributionNon-AWS resources
API GatewayRDS endpoint
Elastic BeanstalkOther CNAMEs
S3 Website Endpoint
VPC Interface Endpoint
Global Accelerator
Another Route 53 record

Health Check Types

TypeMonitorsUse Case
EndpointHTTP/HTTPS/TCP to public IPPublic resources
CalculatedOther health checks (AND/OR)Aggregate multiple checks
CloudWatch AlarmCloudWatch metric statePrivate resources

Key Numbers to Remember

ItemValue
Hosted Zone cost$0.50/month
Health check interval30 sec (10 sec = extra cost)
Health checkers globally~15
Healthy threshold3 consecutive
% checkers for healthy>18%
Multi-Value max records8
Weighted max valueAny number (relative)
Geoproximity bias range-99 to +99
TTL recommendation before changesLow (60 sec)

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“Zone Apex” / “root domain” + AWSAlias record
“example.com to ALB”Alias record
“free DNS queries to AWS”Alias record
“CNAME at root”❌ Not possible → use Alias
“lowest latency” / “best performance”Latency routing
“country restriction” / “legal”Geolocation routing
“localization by region”Geolocation routing
“A/B test” / “canary”Weighted routing
“percentage of traffic”Weighted routing
“active-passive” / “DR”Failover routing
“primary/secondary”Failover routing
“shift traffic” / “bias”Geoproximity routing
“private resource health”CloudWatch Alarm health check
“on-prem → AWS DNS”Inbound Resolver Endpoint
“AWS → on-prem DNS”Outbound Resolver Endpoint
“DNS spoofing” / “DNSSEC”DNSSEC (KMS key in us-east-1)
“share DNS across accounts”Resolver Rules + AWS RAM
“old IP still showing”TTL caching
“GoDaddy + Route 53”Update NS at registrar
“100% availability SLA”Route 53 (only AWS service!)

Part 6: Elimination Checklist

When stuck between options, eliminate systematically:

□ Is it Zone Apex (root domain)?
  → Yes = eliminate CNAME, must use Alias or A
  → No = CNAME is acceptable

□ Do they need health checks?
  → Yes = eliminate Simple routing
  → Failover REQUIRES health checks

□ Is it about USER LOCATION?
  → Physical location = Geolocation
  → Network performance = Latency

□ Is the resource PRIVATE?
  → Yes = eliminate direct HTTP health check
  → Use CloudWatch Alarm instead

□ Is it pointing to AWS resource?
  → Yes = prefer Alias (free, auto-tracking)
  → No = use CNAME or A record

□ Do they need traffic PERCENTAGE control?
  → Yes = Weighted routing
  → Just failover = Failover routing

□ Is it HYBRID (on-prem + AWS)?
  → On-prem queries AWS = Inbound Endpoint
  → AWS queries on-prem = Outbound Endpoint

□ Is it about DNS SECURITY?
  → Spoofing/authenticity = DNSSEC
  → KMS key must be in us-east-1

🏆 The Golden Rules

  1. Zone Apex + AWS = Alias (CNAME doesn’t work at root)
  2. Alias is FREE (CNAME costs money)
  3. Alias TTL = auto-managed (you can’t set it)
  4. Latency ≠ Geography (latency = network speed, geolocation = physical location)
  5. Private resources = CloudWatch Alarm (health checkers can’t reach them)
  6. Failover REQUIRES health checks (no health check = no failover)
  7. Route 53 = DNS responses, not traffic routing (it returns IPs, doesn’t route packets)
  8. Third-party registrar = update NS at registrar (not in Route 53)
  9. Inbound = INTO AWS, Outbound = OUT OF AWS (from AWS perspective)
  10. DNSSEC KMS key = us-east-1 (like CloudFront certificates)
  11. Route 53 = 100% SLA (only AWS service with this guarantee)
  12. Weight = 0 stops ALL traffic (useful for maintenance)

AWS CloudFront & Global Accelerator:


AWS Services: Global vs Regional:

Understanding which services are global vs regional is critical for:

Always Global Services (no region selection):

ServiceWhy GlobalKey Implication
IAMIdentity is account-wideUsers, roles, policies work everywhere
Route 53DNS is globalHosted zones accessible from any region
CloudFrontCDN with edge locationsCerts must be in us-east-1
WAF (for CloudFront)Attached to global CFWAF rules in us-east-1
Global AcceleratorAnycast IPs, global routingEntry point is global
AWS OrganizationsMulti-account managementSCPs apply across all regions
ArtifactCompliance documentsAccount-level access

Regional Services (must select region):

ServiceRegional ScopeCross-Region Options
EC2Instances in one regionAMI copy, snapshots
S3Bucket in one regionCross-Region Replication (CRR)
RDSDB in one regionRead Replicas, snapshots
LambdaFunctions in one regionDeploy to each region
API GatewayAPI in one regionEdge-Optimized uses CF
DynamoDBTable in one regionGlobal Tables (multi-region)
AuroraCluster in one regionGlobal Database
KMSKeys in one regionMulti-Region Keys (mrk-)
Secrets ManagerSecrets in one regionMulti-region replication
CloudHSMHSM in one regionNo cross-region option!
ELBLoad balancer in one regionUse Global Accelerator for global
VPCNetwork in one regionVPC Peering, Transit Gateway

Certificate (ACM) Placement Rules:

ScenarioACM Certificate Region
CloudFront distributionus-east-1 (always)
Edge-Optimized API Gatewayus-east-1 (uses CloudFront)
Regional API GatewaySame region as API
ALB/NLBSame region as load balancer

Memory trick: “Where does TLS terminate?”

Global Services That “Feel” Regional:

ServiceGlobal?Gotcha
S3 bucket namesGlobally uniqueBut bucket lives in ONE region
Lambda@EdgeRuns at edgeMust be authored in us-east-1
WAF for ALBRegionalWAF for CloudFront = global (us-east-1)

Cross-Region Capabilities Summary:

NeedSolution
Global static contentS3 + CloudFront
Global APIAPI Gateway (Edge-Optimized) or Global Accelerator + ALB
Global database (NoSQL)DynamoDB Global Tables
Global database (SQL)Aurora Global Database
Global encryption keysKMS Multi-Region Keys
Global secretsSecrets Manager replication
Global fixed IPsGlobal Accelerator

⚠️ Exam trap: “CloudHSM multi-region” → IMPOSSIBLE. CloudHSM is single-region only, no replication.

⚠️ Exam trap: “Same KMS key in two regions” → Possible with Multi-Region Keys (mrk- prefix). Regular keys are regional.

⚠️ Exam trap: “Lambda@Edge in eu-west-1” → Wrong. Lambda@Edge must be created in us-east-1, CloudFront replicates it.


AWS CloudFront is a Content Delivery Network (CDN), improves read performance, content is cached at the edge.

⚠️ Exam trap: CloudFront SSL/TLS certificates must be in us-east-1 (even if origin is in another region)

CloudFront Origins:

Origin TypeUse CaseNotes
S3 BucketDistribute files, cache at edgeSecured with OAC (Origin Access Control)
VPC OriginPrivate apps in VPC subnetsALB / NLB / EC2 — no public exposure needed
Custom Origin (HTTP)Any public HTTP backendS3 static website, custom servers

⚠️ Exam trap: “Restrict S3 access to CloudFront only” → OAC + S3 Bucket Policy

CloudFront with VPC Origin (Private Resources):

                                    ┌─────────────────────────────────────┐
                                    │ VPC                                 │
                                    │  ┌─────────────────────────────┐    │
Users ──▶ CloudFront ──▶ VPC Origin │  │ Private Subnet              │    │
          (Edge)                    │  │  ├─▶ ALB                    │    │
                                    │  │  ├─▶ NLB                    │    │
                                    │  │  └─▶ EC2                    │    │
                                    │  └─────────────────────────────┘    │
                                    └─────────────────────────────────────┘

CloudFront vs S3 Cross-Region Replication:

FeatureCloudFrontS3 CRR
ScopeGlobal edge networkPer-region setup
UpdatesCached with TTLNear real-time
AccessRead/Write (upload via CF)Read-only
Best forStatic content, global availabilityDynamic content, low-latency in few regions

CloudFront Origin Groups (Failover)

CloudFront Origin Groups (Failover):

                         ┌─────────────────────┐
                         │   Origin Group      │
                         │                     │
Users ──▶ CloudFront ───▶│  Primary: S3 (us-east-1)
                         │      │              │
                         │      ▼ (on error)   │
                         │  Secondary: S3 (eu-west-1)
                         │                     │
                         └─────────────────────┘

⚠️ Exam trap: “CloudFront high availability” or “origin failover” → Origin Groups


CloudFront Cache Invalidations

Cache Invalidation Flow:

Admin ──▶ Invalidate /images/* ──▶ CloudFront ──▶ Edge Locations
                                                   │
                                        ┌──────────┴──────────┐
                                        ▼                     ▼
                                   [Cache]               [Cache]
                                   index.html ✓          index.html ✓
                                   /images/ ✗            /images/ ✗
                                   (invalidated)         (invalidated)

CloudFront Behaviors & Path Patterns

Behaviors = rules that define how CloudFront handles requests for different paths

SettingOptions
Path Pattern/api/*, /images/*, *.jpg, etc.
OriginWhich origin to route to
Cache PolicyTTL, headers/cookies to cache by
Viewer ProtocolHTTP only, HTTPS only, Redirect HTTP→HTTPS
Allowed MethodsGET/HEAD, GET/HEAD/OPTIONS, ALL
Edge FunctionsCloudFront Functions, Lambda@Edge
CloudFront Behaviors Example:

Request Path          Behavior              Origin
─────────────────────────────────────────────────────
/api/*            ──▶ API Behavior     ──▶ ALB (no cache)
/images/*         ──▶ Images Behavior  ──▶ S3 (long TTL)
/static/*         ──▶ Static Behavior  ──▶ S3 (long TTL)
/* (everything)   ──▶ Default Behavior ──▶ ALB (short TTL)

⚠️ Exam traps:


CloudFront Signed URLs & Signed Cookies

FeatureSigned URLSigned Cookie
Access scope1 file per URLMultiple files (entire path)
Use caseIndividual file downloadVideo streaming, multi-file access
URL changeYes (unique per file)No (cookie sent with all requests)

Signed URL vs S3 Pre-Signed URL:

FeatureCloudFront Signed URLS3 Pre-Signed URL
Access viaCloudFront edge (cached)Direct to S3
Use whenCloudFront in front of S3Direct S3 access needed
FeaturesCaching, filtering by IP/path/dateSimple, S3-only

⚠️ Exam trap: “Private content via CloudFront” → Signed URL/Cookie


CloudFront Functions vs Lambda@Edge

Both run code at edge locations, but different scale/capabilities:

FeatureCloudFront FunctionsLambda@Edge
LanguageJavaScript onlyNode.js, Python
Execution time< 1 msUp to 5-10 sec
Max memory2 MB128-3008 MB
ScaleMillions req/secThousands req/sec
TriggersViewer Request/Response onlyViewer + Origin Request/Response
Network/File access
Cost1/6th of Lambda@EdgeHigher
CloudFront Request Flow:

                    CloudFront           CloudFront
                     Functions            Functions
                        │                     │
User ──▶ Viewer Request ▼ ──▶ Cache ──▶ Origin Request ──▶ Origin (S3/ALB)
              │                              │
              │         Lambda@Edge      Lambda@Edge
              │              │                │
         Viewer Response ◀───┘ ◀── Origin Response ◀──────┘

Use Cases:

Use CaseBest Choice
URL rewrites, header manipulationCloudFront Functions
A/B testing (simple)CloudFront Functions
Authentication (JWT validation)CloudFront Functions
Complex auth (DB lookup)Lambda@Edge
Image resizingLambda@Edge
Call external APIsLambda@Edge

⚠️ Exam traps:


CloudFront Geo Restriction

⚠️ Exam trap: “Block/allow by country” → Geo Restriction


CloudFront Pricing & Price Classes

Price ClassRegions IncludedCost
AllAll regionsBest performance, highest cost
200Most regions (excludes South America, Australia/NZ)Balanced
100US, Mexico, Canada, Europe, Israel onlyLowest cost

⚠️ Exam trap: “Reduce CloudFront costs” → use Price Class 100/200 (fewer edge locations)


AWS Global Accelerator

Problem: Global users → public internet → many hops → high latency

Without Global Accelerator (Public Internet):

America ───┐
           │    ┌───┬───┬───┬───┐
Europe ────┼───▶│hop│hop│hop│hop│───▶ Public ALB (India)
           │    └───┴───┴───┴───┘
Australia ─┘         (latency)

Solution: Use AWS internal network via Anycast IPs

How it works:

With Global Accelerator:

Users ──▶ Anycast IP ──▶ Edge Location ──▶ AWS Private Network ──▶ ALB/NLB/EC2
          (static)       (nearest)         (fast, optimized)

Supported Targets: Elastic IP, EC2, ALB, NLB (public or private)

Features:

FeatureDetails
PerformanceIntelligent routing, lowest latency, fast regional failover
Health ChecksFailover < 1 min for unhealthy endpoints, great for DR
SecurityOnly 2 IPs to whitelist, DDoS protection via AWS Shield
CachingNo client cache issues (IPs never change)

Endpoint Weights & Traffic Dial:


Global Accelerator vs CloudFront

FeatureCloudFrontGlobal Accelerator
ContentCacheable + dynamic contentTCP/UDP applications
Caching✅ At edge❌ No caching (proxies packets)
Use casesImages, videos, APIs, websitesGaming (UDP), IoT (MQTT), VoIP
Static IPs✅ 2 Anycast IPs
FailoverTTL-based< 1 min (health checks)

⚠️ Exam traps:


Global Accelerator vs ELB vs Route 53

ServiceScopeRouting LevelHealth ChecksUse Case
ELB (ALB/NLB)Single regionLayer 4/7✅ TargetsDistribute traffic across instances in 1 region
Route 53Global (DNS)DNS level✅ EndpointsDNS-based routing (latency, geo, failover)
Global AcceleratorGlobal (network)Network level✅ EndpointsFast global routing via AWS backbone

Scenario-Based Selection:

ScenarioAnswerWhy
Distribute traffic in 1 regionELBRegional load balancing
Route users to nearest region via DNSRoute 53 (latency routing)DNS resolves to closest endpoint
Instant failover across regions (<1 min)Global AcceleratorNetwork-level, no DNS TTL delay
Need static IPs for global appGlobal Accelerator2 Anycast IPs
Non-HTTP (gaming, IoT, VoIP)Global AcceleratorTCP/UDP support
Cost-sensitive global routingRoute 53Cheaper, but slower failover (DNS TTL)

Failover Speed:

Route 53:         DNS TTL (30s - 5min+) before clients see change
Global Accel:     < 1 minute (health check driven, no DNS caching)

⚠️ Exam traps:

⚠️ Exam trap — Blue-green deployment + DNS caching + tight deadline:


CloudFront Field-Level Encryption

⚠️ Exam trap: “Encrypt specific form fields at edge” → Field-Level Encryption


Quick Reference: Service Comparison Matrix

ScenarioCloudFrontGlobal AcceleratorRoute 53ELB
Cache static content
Non-HTTP (gaming, IoT)NLB only
Static IPsNLB only
Fastest failover (<1 min)❌ (TTL)
DNS-based routing
Single region balancing
Edge compute (Lambda)
Origin failover✅ (Origin Groups)
WebSocket supportN/A✅ (ALB)

Decision Tree:

QuestionYes →
Need to cache content at edge?CloudFront
Non-HTTP protocol (UDP, TCP raw)?Global Accelerator
Need static IPs for whitelisting?Global Accelerator (or NLB)
Need <1 min failover globally?Global Accelerator
DNS-level routing (geo, latency)?Route 53
Load balance within 1 region only?ELB
Run code at edge locations?CloudFront (Functions/Lambda@Edge)


🎯 MASTER SUMMARY: CloudFront & Global Accelerator Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: Global vs Regional — Know Where Services Live

Understanding which services are global vs regional is fundamental for certificate placement, data residency, and cross-region patterns.

Always Global (no region selection):

Regional with Multi-Region Options:

Regional Only (no cross-region):

Derive: “Where does TLS terminate?” = where certificate must be

Principle 2: CloudFront = Content, Global Accelerator = Connections

Two different problems, two different solutions:

CloudFront caches. Global Accelerator proxies (no caching).

Principle 2: Edge Locations = AWS’s Global Presence

Both services use AWS’s 400+ edge locations worldwide:

Edge = closer to users = lower latency.

Principle 3: Anycast vs Unicast IPs

Global Accelerator gives you 2 static Anycast IPs → users connect to same IPs worldwide, routed to nearest edge.

Principle 4: TTL = Cache Control

CloudFront caches based on TTL (Time To Live):

Origin updates don’t propagate until TTL expires (or you invalidate).

Principle 5: Behaviors = Path-Based Routing

CloudFront Behaviors let you:

More specific path patterns take precedence.

Principle 6: Origin Access Control (OAC) = S3 Security

OAC ensures only CloudFront can access your S3 bucket:

OAI (Origin Access Identity) is legacy → use OAC.

Principle 7: Edge Compute = CloudFront Functions vs Lambda@Edge

Two options for running code at edge:

Simple = CloudFront Functions. Complex = Lambda@Edge.

Principle 8: Failover Speed Matters

Different services, different failover speeds:

Need instant failover? → Global Accelerator.


Part 2: Decision Tree (Follow Keywords → Find Answer)

Step 0: Is the service Global or Regional?

Which service?
│
├─► IAM, Route 53, CloudFront, Global Accelerator
│   └─► GLOBAL (no region, but CloudFront certs in us-east-1)
│
├─► DynamoDB, Aurora, KMS, Secrets Manager
│   └─► REGIONAL but has MULTI-REGION options
│
├─► CloudHSM
│   └─► REGIONAL ONLY (no cross-region!)
│
└─► EC2, ELB, VPC, Lambda, API Gateway
    └─► REGIONAL (deploy per region)

Step 1: HTTP or non-HTTP?

                    What protocol?
                          │
            ┌─────────────┴─────────────┐
            ▼                           ▼
      HTTP/HTTPS                   TCP/UDP (non-HTTP)
            │                           │
            ▼                           ▼
    Need caching?              Global Accelerator
            │                   (gaming, IoT, VoIP)
     ┌──────┴──────┐
     ▼             ▼
    Yes            No
     │             │
     ▼             ▼
CloudFront    Global Accelerator
              (if static IPs needed)

Step 2: What’s the main requirement?

                    What's the goal?
                          │
    ┌──────────┬──────────┼──────────┬──────────┐
    ▼          ▼          ▼          ▼          ▼
  Cache     Static     Fast      Edge      Block
 Content     IPs     Failover   Compute   Country
    │          │          │          │          │
    ▼          ▼          ▼          ▼          ▼
CloudFront  Global    Global    CloudFront  CloudFront
           Accel     Accel     Functions/   Geo
                               Lambda@Edge  Restriction

Step 3: Feature-Based Decision Table

If question mentions…Answer is…
“cache at edge”CloudFront
“static content” / “CDN”CloudFront
“gaming” / “UDP”Global Accelerator
“IoT” / “MQTT”Global Accelerator
“VoIP” / “real-time”Global Accelerator
“static IP” / “whitelist IP”Global Accelerator
“fast failover” / “<1 min failover”Global Accelerator
“origin failover” / “HA for origin”CloudFront Origin Groups
“restrict S3 to CloudFront only”OAC + S3 Bucket Policy
“private content” / “authenticated access”Signed URL/Cookie
“block by country”Geo Restriction
“different cache per path”Behaviors
“redirect HTTP to HTTPS”Viewer Protocol Policy
“force cache refresh”Invalidation
“run code at edge”CloudFront Functions or Lambda@Edge
“lightweight edge compute”CloudFront Functions
“complex edge compute” / “DB lookup”Lambda@Edge
“encrypt specific fields”Field-Level Encryption
“reduce CloudFront costs”Price Class 100/200
“SSL certificate for CloudFront”ACM in us-east-1

The “NOT” Rules (Eliminate Wrong Answers Fast)

StatementWhy It’s Wrong
Global Accelerator caches contentGA proxies packets, no caching
CloudFront for UDP/gamingCloudFront = HTTP/HTTPS only
Route 53 for instant failoverRoute 53 = DNS TTL delay (not instant)
Security Groups on CloudFrontCan’t attach SGs to CloudFront
OAC for non-S3 originsOAC is S3-only; use auth headers for ALB/custom
CloudFront Functions for DB accessNo network access — use Lambda@Edge
CloudFront Functions at origin triggersViewer triggers only — use Lambda@Edge
Signed URL for multiple filesUse Signed Cookie for multiple files
Global Accelerator for cachingNo caching — use CloudFront

The “CANNOT” List

Cannot…Instead…
Use CloudFront for UDPUse Global Accelerator
Attach Security Groups to CloudFrontUse Geo Restriction, WAF, or Signed URLs
Use OAI (deprecated)Use OAC (Origin Access Control)
Run CloudFront Functions at originUse Lambda@Edge for origin triggers
Access network in CloudFront FunctionsUse Lambda@Edge
Use CloudFront cert from other regionsACM certificate must be in us-east-1
Get static IPs from CloudFrontUse Global Accelerator for static IPs

Part 3: Scenario Pattern Recognition

Pattern: “Cache static content globally”

Keywords: CDN, cache, static files, images, videos, global distribution

Answer: CloudFront

Why: CloudFront caches at 400+ edge locations. Global Accelerator doesn’t cache.


Pattern: “Gaming / IoT / VoIP application”

Keywords: UDP, TCP, gaming, real-time, MQTT, non-HTTP

Answer: Global Accelerator

Why: CloudFront = HTTP/HTTPS only. Global Accelerator supports any TCP/UDP.


Pattern: “Need static IPs for whitelisting”

Keywords: static IP, firewall whitelist, fixed IP addresses

Answer: Global Accelerator (2 Anycast IPs)

Why: CloudFront uses dynamic IPs. Global Accelerator provides 2 static Anycast IPs.


Pattern: “Fastest possible failover”

Keywords: instant failover, <1 minute, DR, disaster recovery

Answer: Global Accelerator

Why: Route 53 = DNS TTL delay. Global Accelerator = health-check driven, <1 min.


Pattern: “Origin failover for CloudFront”

Keywords: CloudFront HA, origin fails, backup origin

Answer: CloudFront Origin Groups

Why: Primary + secondary origin. Automatic failover on 4xx/5xx errors.


Pattern: “Restrict S3 access to CloudFront only”

Keywords: S3 only via CloudFront, prevent direct S3 access, secure S3 origin

Answer: OAC (Origin Access Control) + S3 Bucket Policy

Why: OAC creates CloudFront identity. S3 policy allows only that identity.


Pattern: “Private content via CloudFront”

Keywords: authenticated users, premium content, temporary access

Answer: Signed URL (1 file) or Signed Cookie (multiple files)

Why: Signed URLs/Cookies include expiration, IP restrictions, trusted signers.


Pattern: “Different settings for different paths”

Keywords: /api/, /images/, path-based, different cache, different origin

Answer: CloudFront Behaviors

Why: Each behavior = path pattern + origin + cache policy + settings.


Pattern: “Block users by country”

Keywords: geo blocking, country restriction, copyright, regional licensing

Answer: CloudFront Geo Restriction

Why: Allowlist or blocklist countries. Based on Geo-IP database.


Pattern: “Run code at edge (simple)”

Keywords: URL rewrite, header manipulation, JWT validation, lightweight

Answer: CloudFront Functions

Why: <1ms execution, JavaScript, millions req/sec, 1/6 cost of Lambda@Edge.


Pattern: “Run code at edge (complex)”

Keywords: database lookup, external API call, image resize, origin trigger

Answer: Lambda@Edge

Why: Up to 10s execution, network access, Node.js/Python, all 4 triggers.


Pattern: “Force cache refresh after update”

Keywords: stale content, cache not updating, force refresh

Answer: CloudFront Invalidation

Why: Bypass TTL, force edge locations to fetch new content from origin.


Pattern: “Reduce CloudFront costs”

Keywords: cost optimization, cheaper CDN, reduce edge locations

Answer: Price Class 100 or 200

Why: Fewer edge locations = lower cost (but potentially higher latency for excluded regions).


Pattern: “Encrypt specific form fields”

Keywords: credit card encryption, PII at edge, field-level security

Answer: CloudFront Field-Level Encryption

Why: Encrypts specific fields at edge → stays encrypted through entire flow.


Pattern: “SSL certificate for CloudFront”

Keywords: HTTPS, custom domain, SSL/TLS certificate

Answer: ACM certificate in us-east-1

Why: CloudFront is global but requires certificates in us-east-1 region.


Part 4: Quick Reference Tables

Global vs Regional Services

ServiceScopeCertificate/Key LocationCross-Region Option
IAMGlobalN/AN/A (account-wide)
Route 53GlobalN/AN/A (global DNS)
CloudFrontGlobalus-east-1N/A (already global)
Global AcceleratorGlobalN/AN/A (already global)
API Gateway (Edge)Regional*us-east-1Uses CloudFront
API Gateway (Regional)RegionalSame regionDeploy per region
LambdaRegionalN/ADeploy per region
Lambda@EdgeGlobal*N/AAuthor in us-east-1
DynamoDBRegionalN/AGlobal Tables
AuroraRegionalN/AGlobal Database
KMSRegionalSame regionMulti-Region Keys (mrk-)
CloudHSMRegionalSame regionNone!
Secrets ManagerRegionalN/AMulti-region replication
S3RegionalN/ACross-Region Replication
ALB/NLBRegionalSame regionUse Global Accelerator

*Edge-Optimized API Gateway lives in one region but uses CloudFront for routing

CloudFront vs Global Accelerator

FeatureCloudFrontGlobal Accelerator
PurposeCache content at edgeRoute traffic via AWS backbone
ProtocolsHTTP/HTTPS onlyAny TCP/UDP
Caching✅ Yes❌ No (proxies packets)
Static IPs❌ No✅ 2 Anycast IPs
Use casesWebsites, APIs, streamingGaming, IoT, VoIP
FailoverOrigin Groups (TTL-based)<1 min (health checks)
Edge compute✅ Functions/Lambda@Edge❌ No
DDoS protection✅ Shield✅ Shield

CloudFront Functions vs Lambda@Edge

FeatureCloudFront FunctionsLambda@Edge
LanguageJavaScript onlyNode.js, Python
Execution time<1 msUp to 5-10 sec
Memory2 MB128-3008 MB
ScaleMillions req/secThousands req/sec
TriggersViewer onlyViewer + Origin
Network access❌ No✅ Yes
Cost1/6th of Lambda@EdgeHigher
FeatureSigned URLSigned CookieS3 Pre-Signed URL
Access scope1 fileMultiple files1 file
Access viaCloudFrontCloudFrontDirect S3
Use caseSingle downloadStreaming, multi-fileDirect S3 access
Caching✅ Yes✅ Yes❌ No (S3 direct)

Service Comparison: When to Use What

ScenarioService
Cache static content globallyCloudFront
Cache + origin failoverCloudFront + Origin Groups
Non-HTTP (gaming, IoT)Global Accelerator
Static IPs for whitelistingGlobal Accelerator
Fastest failover (<1 min)Global Accelerator
DNS-based routingRoute 53
Single region load balancingELB (ALB/NLB)
Edge compute (simple)CloudFront Functions
Edge compute (complex)Lambda@Edge

Failover Speed Comparison

ServiceFailover SpeedMechanism
Route 53DNS TTL (30s - 5min+)DNS resolution
Global Accelerator<1 minuteHealth checks, network-level
CloudFront Origin GroupsImmediate on errorOrigin error triggers
ELBSecondsTarget health checks

CloudFront Origin Types

Origin TypeUse CaseSecurity
S3 BucketStatic filesOAC + Bucket Policy
S3 WebsiteStatic websitePublic bucket or signed URLs
ALBDynamic contentSecurity Group, custom headers
VPC OriginPrivate resourcesNo public exposure needed
Custom HTTPAny HTTP serverAuth headers, IP whitelist

Key Numbers to Remember

ItemValue
Edge locations400+ globally
Global Accelerator static IPs2 Anycast IPs
CloudFront Functions execution<1 ms
Lambda@Edge max execution5-10 seconds
CloudFront Functions memory2 MB
Lambda@Edge max memory3008 MB
ACM certificate regionus-east-1 (required)
Global Accelerator failover<1 minute

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“global service” / “no region”IAM, Route 53, CloudFront, Global Accelerator
“multi-region encryption”KMS Multi-Region Keys (mrk-)
“multi-region database (NoSQL)”DynamoDB Global Tables
“multi-region database (SQL)”Aurora Global Database
“CloudHSM multi-region”IMPOSSIBLE (single-region only)
“Lambda@Edge region”Author in us-east-1
“Edge-Optimized API cert”us-east-1
“Regional API cert”Same region as API
“cache at edge” / “CDN”CloudFront
“static content globally”CloudFront
“gaming” / “UDP” / “IoT”Global Accelerator
“VoIP” / “real-time TCP”Global Accelerator
“static IP” / “whitelist”Global Accelerator
“<1 min failover”Global Accelerator
“origin failover”CloudFront Origin Groups
“S3 only via CloudFront”OAC + Bucket Policy
“OAI”Legacy → use OAC
“private content” (1 file)Signed URL
“private content” (many files)Signed Cookie
“block by country”Geo Restriction
“path-based settings”Behaviors
“HTTP → HTTPS”Viewer Protocol Policy
“stale cache” / “force refresh”Invalidation
“simple edge code”CloudFront Functions
“complex edge code” / “DB”Lambda@Edge
“viewer triggers only”CloudFront Functions (or Lambda@Edge)
“origin triggers”Lambda@Edge only
“encrypt form fields”Field-Level Encryption
“reduce CloudFront cost”Price Class 100/200
“CloudFront SSL cert”ACM in us-east-1
“no caching, just faster”Global Accelerator
“DNS routing”Route 53 (not CloudFront/GA)
“single region LB”ELB (not CloudFront/GA)

Part 6: Elimination Checklist

When stuck between options, eliminate systematically:

□ Is it HTTP/HTTPS?
  → Yes = CloudFront or Global Accelerator
  → No (UDP, raw TCP) = Global Accelerator only

□ Do they need CACHING?
  → Yes = CloudFront
  → No = Global Accelerator (or neither)

□ Do they need STATIC IPs?
  → Yes = Global Accelerator (or NLB)
  → No = CloudFront is fine

□ What's the FAILOVER requirement?
  → Instant (<1 min) = Global Accelerator
  → DNS-based = Route 53
  → Origin failover = CloudFront Origin Groups

□ Is it about EDGE COMPUTE?
  → Simple (headers, rewrites) = CloudFront Functions
  → Complex (network, DB) = Lambda@Edge
  → Origin triggers = Lambda@Edge only

□ Is it about PRIVATE CONTENT?
  → 1 file = Signed URL
  → Multiple files = Signed Cookie
  → Direct S3 = S3 Pre-Signed URL

□ Is it about S3 ORIGIN SECURITY?
  → Restrict to CloudFront = OAC + Bucket Policy
  → OAI mentioned = legacy, use OAC

□ Is it about COUNTRY RESTRICTION?
  → Block/allow by country = Geo Restriction
  → Not Security Groups (can't attach to CF)

□ What REGION for SSL cert?
  → CloudFront = us-east-1 (always)
  → ALB = same region as ALB

🏆 The Golden Rules

  1. Global services = IAM, Route 53, CloudFront, Global Accelerator — no region selection
  2. CloudHSM = regional ONLY — no cross-region replication (unlike KMS Multi-Region)
  3. CloudFront cert MUST be in us-east-1 — even if origin elsewhere
  4. Lambda@Edge authored in us-east-1 — CloudFront replicates globally
  5. Edge-Optimized API Gateway cert in us-east-1 — uses CloudFront behind scenes
  6. Regional API Gateway cert in same region — no CloudFront involved
  7. CloudFront = caching, Global Accelerator = routing — different purposes
  8. Non-HTTP (gaming, IoT, VoIP) = Global Accelerator — CloudFront is HTTP only
  9. Static IPs = Global Accelerator — 2 Anycast IPs
  10. Fastest failover = Global Accelerator — <1 min, no DNS TTL delay
  11. Origin failover = Origin Groups — primary + secondary origin
  12. OAC replaces OAI — use OAC for S3 origin security
  13. Signed URL = 1 file, Signed Cookie = many files
  14. CloudFront Functions = lightweight, Lambda@Edge = complex
  15. Origin triggers = Lambda@Edge only — CloudFront Functions = viewer only
  16. Price Class = cost control — fewer regions = lower cost
  17. Invalidation = force refresh — bypass TTL for immediate updates
  18. Global Accelerator doesn’t cache — it proxies packets through AWS backbone

AWS EC2 (Elastic Compute Cloud)

EC2 (Elastic Compute Cloud) is virtual computer (instance) in the cloud.
EC2 consists:

EC2 configuration options:

EC2 Instance Types:

T/M → General (Typical, Moderate) C → Compute (CPU) R → Memory (RAM) P/G → Accelerated (Processing/GPU) I/D → Storage (I/O, Disk)

r6i.2xlarge │││ └─── Size within the instance class ││└────── Additional capabilities (i = Intel) │└─────── Generation of hardware (6th) └──────── Family - instance class (R = Memory Optimized)

EC2 Placement group: control over the EC2 Instance placement strategy. Placement group strategies:

CLUSTER (same rack)        SPREAD (diff racks)       PARTITION (diff racks)
┌─────────────────┐        ┌───┐ ┌───┐ ┌───┐        ┌─────┐ ┌─────┐ ┌─────┐
│ ┌──┐┌──┐┌──┐┌──┐│        │EC2│ │EC2│ │EC2│        │Part1│ │Part2│ │Part3│
│ │  ││  ││  ││  ││        └───┘ └───┘ └───┘        │┌──┐ │ │┌──┐ │ │┌──┐ │
│ └──┘└──┘└──┘└──┘│        Rack1 Rack2 Rack3        ││  │ │ ││  │ │ ││  │ │
└─────────────────┘        (max 7 per AZ)           │└──┘ │ │└──┘ │ │└──┘ │
  10 Gbps, 1 AZ                                     └─────┘ └─────┘ └─────┘
  Low latency              High availability        100s instances (Kafka)

⚠️ Exam trap: Cluster = same rack (low latency, high risk); Spread = different racks (max 7/AZ); Partition = different racks (100s instances, Kafka/Cassandra)

AMI (Amazon Machine Image) - customization of an EC2 instance, added ext. software.

⚠️ Exam trap: AMIs are region-specific. Cannot launch EC2 from AMI in another region — must copy AMI to target region first (creates new AMI ID)

EC2 Image Builder service to automate the creation, maintain, validate and test of Virtual Machine or container images. Can be run on schedule.

Elastic Network Interfaces (ENI): logical component in a VPC that represents a virtual network card. Bound to a specific AZ.

ENI attributes:

NOTE: You can create ENI independently and attach them on the fly (move them) on EC2 instances for failover.

Security Groups vs NACLs:

FeatureSecurity GroupsNACLs
LevelInstance (ENI)Subnet
StateStateful (return traffic auto-allowed)Stateless (must allow both directions)
RulesAllow onlyAllow AND Deny
Rule OrderAll rules evaluatedRules processed in order (lowest first)
DefaultDeny all inbound, allow all outboundAllow all (default NACL)
AssociationAssigned to instancesAssigned to subnets
┌─────────────────────────────────────────────────────────┐
│                         VPC                             │
│  ┌───────────────────────────────────────────────────┐  │
│  │           Subnet (with NACL)                      │  │
│  │  ┌─────────────────────┐  ┌─────────────────────┐ │  │
│  │  │ Security Group      │  │ Security Group      │ │  │
│  │  │  ┌───────────────┐  │  │  ┌───────────────┐  │ │  │
│  │  │  │      EC2      │  │  │  │      EC2      │  │ │  │
│  │  │  └───────────────┘  │  │  └───────────────┘  │ │  │
│  │  └─────────────────────┘  └─────────────────────┘ │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

⚠️ Exam trap: Security Groups = stateful (allow inbound → outbound auto-allowed). NACLs = stateless (must explicitly allow both directions)

EC2 Instance Lifecycle:

EC2 Hibernate:

EC2 Hibernate - Requirements & Limits:

⚠️ Exam trap: Root EBS must be encrypted for hibernate. Max 60 days, max 150GB RAM

EC2 Purchasing Options:

OptionDescriptionSavingsUse Case
On-DemandPay by second (Linux/Win) or hourNoneShort-term, unpredictable workloads
Reserved1 or 3 year commitmentUp to 72%Steady-state usage (databases)
Savings PlansCommit to $/hour for 1-3 yearsUp to 72%Flexible across instance types
SpotBid on unused capacityUp to 90%Fault-tolerant, flexible workloads
Dedicated HostsPhysical server for your use-Compliance, licensing (per-socket)
Dedicated InstancesHardware dedicated to you-Compliance (no server control)
Capacity ReservationsReserve capacity in specific AZNoneEnsure availability, no discount

Spot Instances:

⚠️ Exam trap: Spot = cheapest but can be terminated. Use for batch jobs, data analysis, CI/CD, NOT databases

Reserved Instances:

⚠️ Exam trap: Reserved = commit for 1-3 years. Convertible RI = less discount but more flexibility

Connect to EC2:
ssh -i /<path>/<key_pair_name>.pem <instance_user_name>@<instance_public_dns_name/IP>
Example:
ssh -i /home/kali/Downloads/aws.pem ubuntu@51.20.123.211



🎯 MASTER SUMMARY: EC2 Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: EC2 is Just a Virtual Computer — Everything Else Solves Its Limitations

EC2 alone is just a VM. The entire ecosystem exists to solve its inherent problems:

Deriving answers: When you see a limitation scenario, think: “What problem is this solving?” The answer maps to the appropriate service.


Principle 2: Instance Types = Optimize for the Bottleneck

Every workload has a bottleneck. Match the instance family to it:

BottleneckFamilyMemory Aid
Nothing specificT/MTypical, Moderate (General)
CPU/ProcessingCCPU = Compute
Memory/RAMRRAM = Memory
GPU/AI/MLP/GProcessing/GPU
Disk/IOPSI/DI/O, Disk = Storage

Deriving answers: “Processing large datasets in-memory” → Bottleneck is RAM → R family. “Batch processing” → Bottleneck is CPU → C family.


Principle 3: Placement Groups Trade-off = Latency vs. Availability

You can’t have both maximum performance AND maximum availability. Choose your priority:

PriorityStrategyTrade-off
Lowest latencyClusterAll in one rack → rack fails = all fail
Maximum isolationSpreadDifferent racks → max 7 instances per AZ
Partition isolation + scalePartitionDifferent racks → 100s of instances

Deriving answers:


Principle 4: AMI = Snapshot of Everything, But Region-Locked

An AMI is a complete image (OS + software + config). The trade-off: specificity vs. portability.

Deriving answers: “Launch instance in another region from existing AMI” → Must copy first (can’t use original AMI ID)


Principle 5: Security Groups vs NACLs = Instance vs. Subnet, Stateful vs. Stateless

WHY stateful matters: If you allow traffic IN, the response OUT is automatic. You don’t need to think about return traffic.

WHY stateless matters: You must explicitly allow BOTH directions. More control, more work.

QuestionSGNACL
“Where does it apply?”Instance (ENI)Subnet
“Can it DENY traffic?”No (allow only)Yes
“Do I need to allow return traffic?”No (stateful)Yes (stateless)
“Which is evaluated first?”Rules are combinedRules processed in order

Deriving answers: “Block specific IP” → NACLs (SGs can’t deny). “Allow port 443 inbound” → Both work, but SGs don’t need outbound rule.


Principle 6: Hibernate = RAM Preserved, But Has Constraints

WHY hibernate exists: Cold boots are slow because RAM is empty. Hibernate saves RAM state.

WHY encryption is required: RAM contains sensitive data — writing it to disk unencrypted = security risk.

Deriving answers:


Principle 7: Purchasing Options = Trade Money for Flexibility

The fundamental trade-off: commitment = savings. More flexibility = more cost.

Most Expensive                                    Cheapest
(Most Flexible)                                   (Least Flexible)
     │                                                  │
     ▼                                                  ▼
On-Demand ─→ Capacity Res ─→ Savings Plans ─→ Reserved ─→ Spot
   100%         100%            ~72%            ~72%      ~90%

But Spot has a catch: AWS can take it back. Use only for work that can be interrupted.

The Mental Model:


Principle 8: Dedicated = Compliance, Not Performance

Dedicated Hosts and Dedicated Instances exist for compliance, not speed.

NeedSolution
Per-socket/per-core licensingDedicated Host (you see the physical server)
Regulatory: “no shared hardware”Either works (Dedicated Instance = simpler)
Just want better performanceNeither (use instance optimization instead)

Deriving answers: “Bring your own license” or “socket-based licensing” → Dedicated Host


Principle 9: Spot Instances = Interruptible, 2-Minute Warning

Spot is AWS’s “leftover capacity” at a discount. The trade-off: they can take it back.

Critical rules:

Good for: Batch processing, CI/CD, data analysis, anything that can restart Bad for: Databases, user-facing apps, anything that can’t handle interruption


Principle 10: EC2 Lifecycle = What Happens to Your Data?

ActionEBS RootInstance StoreRAM
Stop✅ Preserved❌ Lost❌ Lost
Terminate❌ Deleted (default)❌ Lost❌ Lost
Hibernate✅ Preserved + RAM saved❌ Lost✅ Saved to EBS

Deriving answers: “Data survives restart?” → EBS only. “RAM survives?” → Hibernate only.


Part 2: Decision Tree (Follow Keywords → Find Answer)

Instance Type Selection

What's the bottleneck?
        │
        ├─→ "Nothing specific" ─────────────→ T/M (General Purpose)
        │
        ├─→ "CPU" / "batch" / "compute" ───→ C (Compute)
        │
        ├─→ "RAM" / "in-memory" / "cache" ─→ R (Memory)
        │
        ├─→ "GPU" / "ML" / "AI" ───────────→ P/G (Accelerated)
        │
        └─→ "IOPS" / "database" / "OLTP" ──→ I/D (Storage)

Placement Group Selection

What's the priority?
        │
        ├─→ "Lowest latency" / "10 Gbps" ──→ Cluster
        │
        ├─→ "High availability" / "isolation" ─→ Spread (max 7/AZ)
        │
        └─→ "Kafka" / "Hadoop" / "Cassandra" ─→ Partition

Purchasing Option Selection

Can it be interrupted?
        │
        ├─→ YES ─────────────────────────────→ Spot (90% savings)
        │
        └─→ NO
             │
             └─→ How long do you need it?
                      │
                      ├─→ "Hours/days" ──────→ On-Demand
                      │
                      ├─→ "1-3 years" ───────→ Reserved/Savings Plans
                      │
                      └─→ "Guaranteed capacity, no commit" ─→ Capacity Res

The “CANNOT” List

You CANNOT…Because…
Launch EC2 from AMI in different regionAMIs are region-locked (copy first)
Have > 7 instances in Spread placement group (per AZ)Spread = different rack per instance, racks limited
Hibernate with unencrypted root EBSRAM data written to disk = security risk
Hibernate with > 150GB RAMStorage/write time constraint
Hibernate for > 60 daysAWS limitation
Block traffic with Security GroupSGs can only ALLOW (use NACLs to deny)

Part 3: Scenario Pattern Recognition

Pattern: “Need lowest network latency between instances”

Keywords: low latency, 10 Gbps, HPC, tightly coupled Answer: Cluster placement group Why: Same rack = same network switch = lowest latency. Trade-off is single point of failure.


Pattern: “Critical application, maximize availability”

Keywords: high availability, fault tolerance, critical, isolated Answer: Spread placement group Why: Different racks = different failure domains. Limit: 7 instances per AZ.


Pattern: “Kafka/Hadoop/Cassandra with 100s of instances”

Keywords: Kafka, Hadoop, Cassandra, distributed, large scale, partitions Answer: Partition placement group Why: Partition-aware applications distribute replicas across partitions. Scales to 100s.


Pattern: “Processing large in-memory datasets”

Keywords: in-memory, real-time analytics, caching, SAP HANA Answer: Memory Optimized (R family) Why: Bottleneck is RAM. R = RAM.


Pattern: “Batch processing, video encoding”

Keywords: batch, transcoding, compute-intensive, scientific modeling Answer: Compute Optimized (C family) Why: Bottleneck is CPU. C = CPU.


Pattern: “Cost-effective, fault-tolerant workload”

Keywords: cost-effective, can tolerate interruption, batch, CI/CD Answer: Spot Instances Why: 90% savings, but can be interrupted with 2-min warning. OK for resilient workloads.


Pattern: “Steady-state database, long-term”

Keywords: database, steady, 24/7, long-term, predictable Answer: Reserved Instances Why: 72% savings for 1-3 year commitment. Databases run continuously.


Pattern: “Bring your own license (BYOL)”

Keywords: BYOL, per-socket, per-core, software license Answer: Dedicated Host Why: You need visibility into physical server (sockets/cores) for licensing.


Pattern: “Reduce startup time, preserve application state”

Keywords: fast boot, preserve RAM, reduce initialization time Answer: EC2 Hibernate Why: RAM saved to EBS, no cold boot. Must have encrypted root volume.


Pattern: “Block specific IP address”

Keywords: block IP, deny traffic, blacklist Answer: NACL (not Security Group) Why: Security Groups can only ALLOW. NACLs can DENY.


Pattern: “Launch instance in another region from existing AMI”

Keywords: cross-region, AMI, different region Answer: Copy AMI to target region first Why: AMIs are region-specific. Cannot use AMI ID from another region.


Pattern: “Compliance requires dedicated hardware, don’t need server visibility”

Keywords: compliance, dedicated, isolated hardware Answer: Dedicated Instance Why: Simpler than Dedicated Host when you don’t need socket/core visibility.


Pattern: “Need guaranteed capacity in specific AZ, no long-term commitment”

Keywords: capacity, guarantee, specific AZ, no discount needed Answer: On-Demand Capacity Reservation Why: Reserves capacity immediately. No commitment required, but no discount either.


Pattern: “Flexible commitment across instance types”

Keywords: flexible, multiple instance types, Savings Plans Answer: Compute Savings Plans Why: Commit $/hour, use across any instance type/region. More flexible than Reserved.


Part 4: Quick Reference Tables

Instance Type Families

FamilyOptimized ForUse CasesMemory Aid
T, MBalanceWeb servers, small DBsTypical, Moderate
CCPUBatch, video encodingCPU
RRAMIn-memory DBs, cachingRAM
P, GGPUML, graphicsProcessing, GPU
I, DDisk IOPSDatabases, data warehousesI/O, Disk

Placement Group Comparison

StrategySame Rack?Max InstancesUse Case
ClusterYesNo limitLow latency, HPC
SpreadNo7 per AZHigh availability
PartitionNo100sHadoop, Kafka, Cassandra

Purchasing Options Quick Comparison

OptionSavingsCommitmentInterruption?
On-Demand0%NoneNo
Reserved72%1-3 yearsNo
Savings Plans72%$/hour for 1-3 yearsNo
Spot90%NoneYES (2-min warning)
Dedicated HostVariesOptionalNo
Capacity Res0%NoneNo

Hibernate Requirements

RequirementLimit
RAM< 150 GB
Root VolumeEBS, encrypted, large enough
Max Duration60 days
NOT SupportedDedicated Hosts, bare metal

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“lowest latency between instances”Cluster placement group
“10 Gbps bandwidth”Cluster placement group
“spread across racks”Spread or Partition
“max 7 instances”Spread placement group
“Kafka, Hadoop, Cassandra”Partition placement group
“in-memory” / “real-time analytics”R family (Memory)
“batch processing”C family (Compute)
“video transcoding”C family (Compute)
“GPU” / “ML training”P/G family (Accelerated)
“high IOPS” / “OLTP”I/D family (Storage)
“90% savings”Spot Instances
“2-minute warning”Spot Instances
“can be interrupted”Spot Instances
“steady-state” + “database”Reserved Instances
“1-3 year commitment”Reserved Instances
“flexible across instance types”Savings Plans
“per-socket licensing”Dedicated Host
“BYOL”Dedicated Host
“compliance + dedicated hardware”Dedicated Host or Instance
“guarantee capacity” + “no commitment”Capacity Reservation
“fast boot” / “preserve RAM”EC2 Hibernate
“reduce startup time”EC2 Hibernate
“encrypted root volume” + “hibernate”Required for Hibernate
“block IP”NACL (not SG)
“stateless”NACL
“stateful”Security Group
“deny traffic”NACL
“allow only”Security Group
“cross-region AMI”Copy AMI first
“AMI different region”Copy AMI first
“automate AMI creation”EC2 Image Builder
“ENI failover”Move ENI to standby instance

Part 6: Elimination Checklist

Choosing Instance Type

□ Is the workload CPU-bound?
  → Yes = C family
  → No = continue

□ Does it need lots of RAM?
  → Yes = R family
  → No = continue

□ Does it need GPU?
  → Yes = P/G family
  → No = continue

□ Does it need high disk IOPS?
  → Yes = I/D family
  → No = T/M (General Purpose)

Choosing Purchasing Option

□ Can workload tolerate interruption?
  → Yes = Consider Spot (90% savings)
  → No = continue

□ Is usage predictable for 1-3 years?
  → Yes = Reserved or Savings Plans
  → No = continue

□ Do you need flexibility across instance types?
  → Yes = Savings Plans
  → No = Reserved Instances

□ Short-term, unpredictable?
  → Yes = On-Demand

Choosing Placement Group

□ Need lowest possible latency?
  → Yes = Cluster
  → No = continue

□ Need maximum isolation/availability?
  → Yes = Spread (max 7/AZ)
  → No = continue

□ Running Kafka/Hadoop/Cassandra at scale?
  → Yes = Partition
  → No = No placement group needed

Hibernate Eligibility

□ Is root volume EBS and encrypted?
  → No = Hibernate NOT available
  
□ Is RAM < 150GB?
  → No = Hibernate NOT available
  
□ Is it Dedicated Host or bare metal?
  → Yes = Hibernate NOT available
  
□ All above passed?
  → Hibernate available

🏆 The Golden Rules

  1. Instance family = bottleneck (C=CPU, R=RAM, I/D=Disk, P/G=GPU)
  2. Cluster = same rack = fastest but risky (one failure = all fail)
  3. Spread = different racks = safest (max 7 instances per AZ)
  4. Partition = different racks + scale (Kafka, Hadoop, Cassandra)
  5. AMIs are region-locked (copy to use in another region)
  6. Security Groups = allow only, stateful (NACLs = allow+deny, stateless)
  7. Hibernate = RAM preserved (needs encrypted EBS, <150GB RAM, max 60 days)
  8. Spot = cheapest but interruptible (90% savings, 2-min warning)
  9. Reserved = commit for savings (72%, 1-3 years)
  10. Dedicated Host = you see the server (for BYOL, socket licensing)
  11. NACL to DENY, SG to ALLOW (SGs can’t block specific IPs)
  12. Savings Plans = flexible Reserved (commit $/hour, not instance type)

Storage:

Amazon EC2 Instance store provides temporary block-level storage for your instance. This storage is located on disks that are physically attached to the host computer. Instance store is ideal for temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content. It can also be used to store temporary data that you replicate across a fleet of instances, such as a load-balanced pool of web servers.

Amazon EBS (Elastic Block Store) volume is a block-level storage, it’s a network drive, not physical drive - uses the network to communicate the instance, has a bit of latency. EBS has a provisioned capacity (size in GBs, and IOPS) that can be increased over time (billed by provisioned, not used).

EBS Delete on Termination attribute controls the EBS behaviour when an EC2 instance terminates. Root EBS volme is going to be deleted by default, any other attached EBS volume will get disabled Termination attribute.

EBS Snapshot backup (snapshot) of your EBS volume at a point in time. Not necessary to detach volume, but recommended. Snapshots consume IO - avoid during high traffic. Possible to copy snapshots across AZ or Regions. EBS Snapshot Archive EBS Snapshots could be moved to Archive (that is 75% cheaper, but it takes within 24 to 72 hours for restoring the archive).* Recycle Bin rules to retain deleted snapshots to recover them after an accidental deletion (from 1 day to 1 year) Fast Snapshot Restore (FSR) - eliminates latency on first use of EBS volume created from snapshot by pre-initializing all data blocks. Without FSR, volumes load data lazily from S3 causing performance penalty until “warmed up”. Enabled per snapshot per AZ. Expensive - use for critical workloads needing immediate full performance (databases, time-sensitive apps).

EBS Snapshot - Cross-Region & Encryption Flow:

┌─────────┐   snapshot   ┌───────────┐   copy      ┌───────────┐
│   EBS   │ ───────────→ │  Snapshot │ ─────────→  │  Snapshot │
│ (AZ-A)  │              │ (Region A)│  (encrypt)  │ (Region B)│
└─────────┘              └─────┬─────┘             └─────┬─────┘
                               │ restore                 │ restore
                               ▼                         ▼
                         ┌─────────┐               ┌─────────┐
                         │   EBS   │               │   EBS   │
                         │ (AZ-A)  │               │ (AZ-X)  │
                         └─────────┘               └─────────┘

Local EC2 Instance Store a high-performance hardware disk (better I/O performance than network drives - EBS volumes). Good for buffer / cache / scratch data / temporary content (Risk of data loss if hardware fails). Backups and Replication are your responsibility

⚠️ Exam trap: Instance Store = ephemeral (data lost on stop/terminate). Best I/O performance but no persistence

Instance Store Limitations:

EBS Volume Types

⚠️ Exam trap: gp2 IOPS linked to size (3 IOPS/GB); gp3 IOPS independent — know the difference!

Provisioned IOPS (PIOPS) SSD - Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads: System boot volumes, databases workloads. Supports EBS Multi-attach; - io1: 4 GiB - 16 TiB, up to 64,000 IOPS (linked to size - 50 IOPS per 1 GiB, max IOPS at 1,280 GB); - io2 (higher durability - 99.999%): 4 GiB - 16 TiB, up to 64,000 IOPS (linked to size - 50 IOPS per 1 GiB, max IOPS at 1,280 GB); - io2 Block Express (sub-ms latency): 4 GiB - 64 TiB, up to 256,000 IOPS, (linked to size - 50 IOPS per 1 GiB, max IOPS at 1,280 GB);

EBS Multi-Attach: Achieve higher application availability in clustered Linux applications (ex: Teradata) by connecting the same EBS volume to multiple (up to 16) EC2 Instances at a time. Must be in the same AZ and only cluster-aware (GFS2, OCFS2, and NOT EXT4/XFS) file system is supported.

Featuregp2gp3io1io2io2 Block Express
TypeGeneral Purpose SSDGeneral Purpose SSDProvisioned IOPS SSDProvisioned IOPS SSDProvisioned IOPS SSD
Size1 GiB - 16 TiB1 GiB - 16 TiB4 GiB - 16 TiB4 GiB - 16 TiB4 GiB - 64 TiB
Max IOPS16,00016,00064,000*64,000*256,000
Baseline IOPS3 IOPS/GiB (min 100)3,000ProvisionedProvisionedProvisioned
IOPS:GiB Ratio3:1 (linked)Independent50:1500:11,000:1
Max Throughput250 MiB/s1,000 MiB/s1,000 MiB/s1,000 MiB/s4,000 MiB/s
Durability99.8% - 99.9%99.8% - 99.9%99.8% - 99.9%99.999%99.999%
LatencySingle-digit msSingle-digit msSingle-digit msSingle-digit msSub-millisecond
Boot Volume✅ Yes✅ Yes✅ Yes✅ Yes✅ Yes
Multi-Attach*❌ No❌ No✅ Yes✅ Yes✅ Yes
Use CaseDev/test, boot volumesGeneral workloadsDatabases, critical appsDatabases, critical appsHighest performance

*64,000 IOPS on Nitro instances, 32,000 on others

HDD Volume Types (Cannot be boot volumes)

Featurest1 (Throughput Optimized)sc1 (Cold HDD)
Size125 GiB - 16 TiB125 GiB - 16 TiB
Max Throughput500 MiB/s250 MiB/s
Max IOPS500250
Boot Volume❌ No❌ No
Use CaseBig Data, Data Warehouses, Log ProcessingInfrequent access, lowest cost
CostLowLowest

⚠️ Exam trap: HDD (st1/sc1) cannot be boot volumes. Only SSD (gp2/gp3/io1/io2) can boot

EBS Encryption: Fully managed, transparent encryption using KMS (AES-256) with minimal latency impact. Encrypts:

*Encrypt unencrypted EBS volume: Create snapshot → Copy snapshot with encryption enabled → Create volume from encrypted snapshot → Attach to instance.

Amazon EFS (Elastic File System) - managed NFS that can be mounted on many EC2 instances and on-premises (multi-AZ). Highly available, auto-scaling (petabytes, no capacity planning), expensive (~3x gp2 cost, pay-per-use). Use cases: content management, web serving, data sharing, WordPress.

⚠️ Exam trap: EFS = Linux only (POSIX). Performance Mode cannot be changed after creation; Throughput Mode can

EFS Performance & Throughput Modes:

EFS Storage Classes (lifecycle policies move files after N days):

Amazon FSx for Windows File Server fully managed, highly reliable and scalable Windows native shared file system based on SMB protocol and Windows NTFS (Integrated with Microsoft Active Directory).

Amazon FSx for Lustre (Linux cluster) a fully managed high-performance, scalable file system for High Performance Computing (HPC): machine learning, analytics, video processing and financial modeling.

Instance Store vs EBS vs EFS:

FeatureInstance StoreEBSEFS
TypeBlock storage (local)Block storage (network)File storage (NFS)
Instances11 (except io1/io2 Multi-Attach)100s across AZs
AZLocked to instanceLocked to one AZMulti-AZ (regional)
PersistenceEphemeral (lost on stop)Persists independentlyPersists independently
PerformanceBest (hardware attached)Good, network latencyGood, higher latency
OSLinux & WindowsLinux & WindowsLinux only (POSIX)
CostIncluded with instanceProvisioned capacity~3x EBS, pay-per-use
Use caseCache, temp dataBoot volumes, databasesShared files, WordPress
┌─────────────┐     ┌─────────────┐     ┌─────────────────────────┐
│ Instance    │     │    EBS      │     │          EFS            │
│   Store     │     │  (Network)  │     │     (Multi-AZ NFS)      │
├─────────────┤     ├─────────────┤     ├─────────────────────────┤
│ ┌─────────┐ │     │   ┌─────┐   │     │   ┌───┐ ┌───┐ ┌───┐    │
│ │   EC2   │ │     │   │ EC2 │   │     │   │EC2│ │EC2│ │EC2│    │
│ │ ┌─────┐ │ │     │   └──┬──┘   │     │   └─┬─┘ └─┬─┘ └─┬─┘    │
│ │ │Disk │ │ │     │      │      │     │     └─────┼─────┘      │
│ │ └─────┘ │ │     │   ┌──┴──┐   │     │       ┌───┴───┐        │
│ └─────────┘ │     │   │ EBS │   │     │       │  EFS  │        │
└─────────────┘     │   └─────┘   │     └───────┴───────┴────────┘
  Ephemeral         └─────────────┘         Shared across AZs
  Best I/O            Single AZ             Linux only, pay-per-use

AWS Storage Gateway: bridge between on-premise data and cloud data in S3, hybrid storage service to allow on-premise to seamlessly use the AWS Cloud. Use cases: disaster recovery, backup & restore, tiered storage.

Types of Storage Gateway:

        On-Premises                              AWS Cloud
┌─────────────────────────┐              ┌─────────────────────────┐
│                         │              │                         │
│  ┌───────────────────┐  │              │  ┌─────────────────┐    │
│  │   File Gateway    │──┼──────────────┼─→│   S3 / FSx      │    │
│  └───────────────────┘  │   NFS/SMB    │  └─────────────────┘    │
│                         │              │                         │
│  ┌───────────────────┐  │              │  ┌─────────────────┐    │
│  │  Volume Gateway   │──┼──────────────┼─→│  EBS Snapshots  │    │
│  └───────────────────┘  │   iSCSI      │  └─────────────────┘    │
│                         │              │                         │
│  ┌───────────────────┐  │              │  ┌─────────────────┐    │
│  │   Tape Gateway    │──┼──────────────┼─→│ S3 Glacier/Deep │    │
│  └───────────────────┘  │   VTL        │  └─────────────────┘    │
└─────────────────────────┘              └─────────────────────────┘

⚠️ Exam trap: File Gateway = S3/FSx (NFS/SMB); Volume Gateway = EBS snapshots (iSCSI); Tape Gateway = Glacier (VTL)

AWS S3:

S3 (Simple Storage Service) provides object storage through a web service interface — “infinitely scaling” storage.
Amazon S3: allows to store objects (files) in ‘buckets’ (directories).
Amazon S3 offers unlimited storage space. The maximum file size for an object in Amazon S3 is 5 TB.

Use Cases: Backup/storage, Disaster Recovery, Archive, Hybrid Cloud storage, Media hosting, Data lakes & big data analytics, Static websites, Software delivery

Buckets:

⚠️ Exam trap: “Can’t create bucket” + correct IAM permissions → name already taken globally

Naming convention:

Objects have a Key, which is a full path to them (s3://<bucket_name>/<folder_name>/<file-name>). Max size of an Object is 5TB (5000GB), if uploading more than 5GB, should be used “multi-part upload”.

S3 Consistency Model:

⚠️ Exam trap: “Overwrite object, immediately read” → S3 always returns the latest version. Old “eventual consistency” behavior is gone. Distractors mentioning “might return previous data” or “might return new data” are wrong.

Amazon S3 Versioning protects against unintended deletes. It is enabled at the bucket level.

Amazon S3 Replication:

┌───────────┐
│ S3 Bucket │ (eu-west-1)
└─────┬─────┘
      │ asynchronous
      │ replication
      ▼
┌───────────┐
│ S3 Bucket │ (us-east-2)
└───────────┘

S3 Security:

S3 Access Scenarios:

ScenarioUse
IAM User → S3IAM Policy attached to user
EC2 Instance → S3IAM Role attached to EC2
Cross-Account → S3Bucket Policy (resource-based)
Public/Anonymous → S3Bucket Policy with Principal: "*"
1. IAM User Access          2. EC2 Instance Access       3. Cross-Account Access
   ┌──────────┐                ┌──────────┐                 ┌──────────┐
   │IAM Policy│                │ IAM Role │                 │  Bucket  │
   └────┬─────┘                └────┬─────┘                 │  Policy  │
        │                           │                       └────┬─────┘
   ┌────▼─────┐                ┌────▼─────┐                      ▼
   │ IAM User │───────────────▶│   EC2    │─────────────▶  ┌───────────┐
   └──────────┘                └──────────┘                │ S3 Bucket │
        │                                                  └───────────┘
        ▼                                                        ▲
   ┌──────────┐                                            ┌─────┴─────┐
   │ S3 Bucket│              4. Public Access              │ IAM User  │
   └──────────┘                 ┌──────────┐               │Other Acct │
                                │  Bucket  │               └───────────┘
                                │  Policy  │
                                │Principal:│
                                │   "*"    │
                                └────┬─────┘
                                     ▼
                           ┌───────────────────┐
                           │ Anonymous Visitor │───▶ S3 Bucket
                           └───────────────────┘

Bucket Policy (JSON): Resources, Effect (Allow/Deny), Actions (API calls), Principal (account/user)

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "PublicRead",
    "Effect": "Allow",
    "Principal": "*",
    "Action": ["s3:GetObject"],
    "Resource": ["arn:aws:s3:::examplebucket/*"]
  }]
}

Block Public Access — prevent data leaks:

Access granted if: (IAM permissions ALLOW it OR resource policy ALLOWS it) AND no explicit DENY

⚠️ Exam trap: Bucket policy ALLOWS but user can’t access → check for explicit DENY in IAM policy (DENY always wins)

Encryption: encrypt objects using encryption keys

S3 Static Website Hosting:

S3 Durability & Availability:

S3 Storage Classes:

Move between classes manually or using S3 Lifecycle configurations

⚠️ Exam traps - Storage Classes:

S3 Storage Classes Comparison:

ClassAvail.AZsMin DurationRetrievalUse Case
Standard99.99%≥3NoneInstant, freeFrequently accessed
Intelligent-Tiering99.9%≥3NoneInstant, freeUnknown access patterns
Standard-IA99.9%≥330 daysInstant, per GBInfrequent but rapid access
One Zone-IA99.5%130 daysInstant, per GBSecondary backups, recreatable
Glacier Instant99.9%≥390 daysms, per GBOnce/quarter access
Glacier Flexible99.99%≥390 days1-5 min / 3-5 hr / 5-12 hrArchive, flexible retrieval
Glacier Deep Archive99.99%≥3180 days12 hr / 48 hrLong-term archive
Express One Zone99.95%1None<10msAI/ML, HPC, low-latency

Durability: 99.999999999% (11 9’s) for ALL classes

⚠️ Exam trap: Lifecycle transition timing must respect minimum storage duration

S3 Performance:

⚠️ Exam traps:

S3 Batch Operations:

⚠️ Exam trap: “Encrypt existing objects” / “change encryption on all files” → S3 Batch Operations

S3 Inventory ──▶ Athena (filter) ──▶ S3 Batch Operations ──▶ Processed Objects
     │                                      ▲
     └── Objects List Report                │
                                   User: operation + params

S3 Lifecycle Rules:

⚠️ Exam trap:

Storage Class Transitions (allowed paths):

Standard ──┬──▶ Standard-IA ──┬──▶ Intelligent-Tiering ──┬──▶ One Zone-IA
           │                  │                          │
           │                  ▼                          ▼
           ├──▶ Glacier Instant ◀────────────────────────┤
           │                                             │
           ▼                                             ▼
      Glacier Flexible ◀─────────────────────────────────┤
           │                                             │
           ▼                                             ▼
      Glacier Deep Archive ◀─────────────────────────────┘

(All classes can transition DOWN, never UP)

Lifecycle Scenarios:

ScenarioSolution
Thumbnails recreatable, needed 60 days, then deleteOne Zone-IA + expire after 60 days
Source images: immediate access 60 days, then 6hr retrieval OKStandardGlacier after 60 days
Recover deleted objects immediately for 30 days, then 48hr OK for 365 daysVersioning + noncurrent → Standard-IAGlacier Deep Archive

S3 Analytics - Storage Class Analysis:

⚠️ Exam trap: “Optimal days to transition” / “Lifecycle recommendations” → S3 Analytics (not Inventory!)

S3 Requester Pays:

S3 Event Notifications:

           ┌──▶ SNS
           │
S3 Events ─┼──▶ SQS
           │
           └──▶ Lambda

S3 Event Notifications with EventBridge:

⚠️ Exam trap: “Get notified on object upload” → Event Notifications (NOT Access Logs, Analytics, or Select)


S3 Storage Lens

Overview:

                    ┌─ Organization
                    ├─ Accounts
S3 Storage Lens ───▶├─ Regions        ───▶ Aggregate ───▶ Dashboard ───▶ ┌─ Summary Insights
   (Configure)      └─ Buckets                           (Analyze)       ├─ Data Protection
                                                                         └─ Cost Efficiency
                                                                            (Optimize)

Default Dashboard:

Metrics Categories:

CategoryKey MetricsUse Cases
SummaryStorageBytes, ObjectCountIdentify fastest-growing or unused buckets
Cost-OptimizationNonCurrentVersionStorageBytes, IncompleteMultipartUploadStorageBytesFind incomplete multipart uploads >7 days, transition candidates
Data-ProtectionVersioningEnabledBucketCount, MFADeleteEnabledBucketCount, SSEKMSEnabledBucketCountAudit data protection best practices
Access-ManagementObjectOwnershipBucketOwnerEnforcedBucketCountCheck Object Ownership settings
EventEventNotificationEnabledBucketCountIdentify buckets with Event Notifications
PerformanceTransferAccelerationEnabledBucketCountFind buckets with Transfer Acceleration
ActivityAllRequests, GetRequests, PutRequests, BytesDownloadedUnderstand storage request patterns
Status Code200OKStatusCount, 403ForbiddenErrorCount, 404NotFoundErrorCountMonitor HTTP response distribution

Free vs Paid:

FeatureFreeAdvanced (Paid)
Metrics~28 usage metrics+ Activity, Cost Optimization, Data Protection, Status Code
Retention14 days15 months
CloudWatch Publishing
Prefix Aggregation

S3 Encryption

4 Encryption Methods:

MethodKey ManagementHeaderNotes
SSE-S3AWS-managed"x-amz-server-side-encryption": "AES256"Default for new buckets, AES-256
SSE-KMSAWS KMS"x-amz-server-side-encryption": "aws:kms"Audit via CloudTrail, KMS quota limits
DSSE-KMSAWS KMS (double)"x-amz-server-side-encryption": "aws:kms:dsse"Two layers of encryption, compliance
SSE-CCustomer-managed (outside AWS)Key in every HTTP headerHTTPS required, S3 doesn’t store key
Client-SideCustomer encrypts before uploadN/AFull control, use S3 Encryption Library

⚠️ Exam trap: “Customer manages keys” + “never store keys in AWS” → SSE-C or Client-Side

⚠️ Exam trap: “Keys in AWS OK” + “control rotation policy” → SSE-KMS

⚠️ Exam trap: “Encrypt all objects by default” → Do nothing (SSE-S3 is automatic since Jan 2023)

Encryption Evaluation Order:

  1. Bucket Policy evaluated first (can deny/require specific encryption)
  2. Default Encryption applied if no encryption header in request
SSE-S3 (Server-Side Encryption with S3-Managed Keys):

User ──── HTTP(S) + Header ────▶ ┌─────────────────────────────────┐
          (upload object)        │           Amazon S3             │
                                 │  Object + S3 Owned Key          │
                                 │         ↓                       │
                                 │    [Encryption]                 │
                                 │         ↓                       │
                                 │    S3 Bucket (encrypted)        │
                                 └─────────────────────────────────┘

SSE-KMS Limitation:

SSE-KMS (Server-Side Encryption with KMS Keys):

User ──── HTTP(S) + Header ────▶ ┌─────────────────────────────────┐
          (upload object)        │           Amazon S3             │
                                 │  Object + KMS Key (API call)    │
                                 │         ↓                       │
                                 │    [Encryption]                 │
                                 │         ↓                       │
                                 │    S3 Bucket (encrypted)        │
                                 └─────────────────────────────────┘
                                            ▲
              ┌─────────┐                   │ API call
              │ KMS Key │───────────────────┘
              └─────────┘
        (GenerateDataKey / Decrypt)

⚠️ Exam trap: “High-throughput S3” + “encryption” → SSE-S3 (not SSE-KMS!)

SSE-C (Server-Side Encryption with Customer-Provided Keys):

User ──── HTTPS ONLY ──────────▶ ┌─────────────────────────────────┐
     (object + key in header)    │           Amazon S3             │
                                 │  Object + Client-Provided Key   │
                                 │         ↓                       │
                                 │    [Encryption]                 │
                                 │         ↓                       │
                                 │    S3 Bucket (encrypted)        │
                                 └─────────────────────────────────┘
                                 (S3 discards key after use)
Client-Side Encryption:

┌──────┐   ┌────────────┐   ┌──────────────┐            ┌───────────┐
│ File │ + │ Client Key │ → │ [Encryption] │ → HTTP(S) → │ S3 Bucket │
└──────┘   └────────────┘   │ (client-side)│            │(encrypted)│
                            └──────────────┘            └───────────┘
           (Customer manages keys + encryption cycle)

Force Encryption in Transit (HTTPS):

{
  "Effect": "Deny",
  "Principal": "*",
  "Action": "s3:GetObject",
  "Resource": "arn:aws:s3:::my-bucket/*",
  "Condition": {
    "Bool": { "aws:SecureTransport": "false" }
  }
}

S3 CORS (Cross-Origin Resource Sharing)

CORS Flow (Preflight Request):

┌────────────────┐                           ┌────────────────┐
│  Web Server    │                           │  Web Server    │
│   (Origin)     │                           │ (Cross-Origin) │
│ example.com    │                           │  other.com     │
└───────┬────────┘                           └───────▲────────┘
        │                                            │
        │ HTTPS Request                              │
        ▼                                            │
   ┌─────────────┐    1. OPTIONS (Preflight)         │
   │ Web Browser │──────────────────────────────────▶│
   │             │    Host: other.com                │
   │             │    Origin: example.com            │
   │             │◀──────────────────────────────────│
   │             │    2. Preflight Response          │
   │             │    Access-Control-Allow-Origin:   │
   │             │      https://example.com          │
   │             │    Access-Control-Allow-Methods:  │
   │             │      GET, PUT, DELETE             │
   │             │──────────────────────────────────▶│
   └─────────────┘    3. GET / (actual request)      │
                      Host: other.com                │
                      Origin: example.com            │

S3 CORS:

S3 CORS Example (Static Website with Assets in Different Bucket):

┌─────────────┐   1. GET /index.html                    ┌─────────────────────┐
│ Web Browser │────────────────────────────────────────▶│ S3: my-bucket-html  │
│             │◀────────────────────────────────────────│ (Static Website)    │
│             │   index.html                            │ Origin bucket       │
│             │                                         └─────────────────────┘
│             │   2. GET /images/coffee.jpg
│             │      Host: my-bucket-assets.s3-website...
│             │      Origin: my-bucket-html.s3-website...
│             │────────────────────────────────────────▶┌─────────────────────┐
│             │◀────────────────────────────────────────│ S3: my-bucket-assets│
│             │   Access-Control-Allow-Origin:          │ (Static Website)    │
└─────────────┘     my-bucket-html.s3-website...        │ Cross-origin bucket │
                                                        │ ← CORS config here  │
                                                        └─────────────────────┘

⚠️ Exam trap: CORS errors on S3 → configure CORS on the target bucket (the one being requested), not the origin


S3 MFA Delete

Requires MFA code before critical S3 operations


S3 Access Logs

⚠️ Exam trap: Never set logging bucket = monitored bucket → creates infinite loop, bucket grows exponentially

⚠️ Exam trap: “Audit who accessed/tried to access S3” → S3 Access Logs + Athena


S3 Pre-Signed URLs

Expiration:

MethodDefaultMax
S3 Console-720 min (12 hours)
AWS CLI3600 sec (1 hour)604800 sec (168 hours / 7 days)

Use Cases:


S3 Access Points

S3 Access Points:

Users (Finance) ───▶ Finance Access Point ───┐
                     (R/W to /finance/*)     │
                                             ▼
Users (Sales) ─────▶ Sales Access Point ────▶ S3 Bucket ◀── Simple
                     (R/W to /sales/*)       │               Bucket
                                             │               Policy
Users (Analytics) ─▶ Analytics Access Point ─┘
                     (R to entire bucket)

VPC Origin Access Points:

VPC Origin:

┌─────────────────────────────────────────────────────────────────────┐
│ VPC                                                                 │
│  EC2 ──▶ VPC Endpoint ──▶ Access Point (VPC Origin) ──▶ S3 Bucket  │
│          (Endpoint       (Access Point                 (Bucket     │
│           Policy)          Policy)                      Policy)    │
└─────────────────────────────────────────────────────────────────────┘

S3 Object Lambda

S3 Object Lambda:

                                    ┌─────────────────────────────────────┐
E-Commerce App ──▶ Original Object ─┤ S3 Access Point ──▶ S3 Bucket       │
                                    │                                     │
Analytics App ───▶ Redacted Object ─┤ Object Lambda AP ──▶ Redacting λ ───┤
                                    │                                     │
Marketing App ───▶ Enriched Object ─┤ Object Lambda AP ──▶ Enriching λ ◀──┼── Customer DB
                                    └─────────────────────────────────────┘

Use Cases:

⚠️ Exam trap: “Transform/redact data before retrieval” → S3 Object Lambda


S3 WORM Protection (Vault Lock & Object Lock)

FeatureGlacier Vault LockS3 Object Lock
Applies toGlacier Vaults onlyAny S3 storage class
Requires Versioning
Lock LevelEntire vaultPer object version
Policy ImmutableYes (after lock)Depends on mode

S3 Object Lock Modes:

ModeWho Can Delete?Change Settings?Use Case
ComplianceNo one (including root)Regulatory requirements
GovernanceSpecial permission usersInternal policies

Lock Reversal & Override Details:

Lock TypeCan Shorten?Can Remove?Can Delete Object?Who Can Override?
Compliance Retention❌ Never❌ Never❌ Until expiresNo one — wait for expiry
Governance RetentionUsers with s3:BypassGovernanceRetention + header x-amz-bypass-governance-retention:true
Legal HoldN/A❌ While activeUsers with s3:PutObjectLegalHold permission
Vault Lock❌ Never❌ Never❌ Per policyNo one — delete vault to remove (loses all data)

Object Lock Features:

⚠️ Exam traps:



🎯 MASTER SUMMARY: S3 Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: S3 is Object Storage, Not File System

S3 stores objects (files) in buckets (containers). There are no real directories — just keys with slashes.

Principle 2: Durability vs Availability

Two different concepts:

All storage classes have same durability. Availability differs.

Principle 3: Access Control Hierarchy

Access granted if: (IAM allows OR Resource policy allows) AND no explicit DENY

Who’s accessing?Use…
IAM User in same accountIAM Policy
EC2/LambdaIAM Role
Cross-accountBucket Policy
Public/AnonymousBucket Policy with Principal: "*"

DENY always wins — if any policy denies, access is denied.

Principle 4: Encryption is Automatic (Since Jan 2023)

Principle 5: Lifecycle = Move or Delete, Batch = Transform

Two different tools for different jobs:

Lifecycle cannot encrypt. Batch Operations can.

Principle 6: Replication ≠ Backup

Principle 7: Performance = Prefixes

S3 scales per prefix:

Principle 8: WORM = Write Once Read Many

Two mechanisms:

Compliance mode = NO ONE can delete (not even root or AWS Support)


Part 2: Decision Tree (Follow Keywords → Find Answer)

Step 1: What storage class?

                    What's the access pattern?
                              │
    ┌─────────────┬───────────┼───────────┬─────────────┬─────────────┐
    ▼             ▼           ▼           ▼             ▼             ▼
 Frequent     Unknown/     Infrequent  Archive      Archive       Lowest
 Access       Changing     Access      (instant)    (flexible)    Latency
    │             │           │           │             │             │
    ▼             ▼           ▼           ▼             ▼             ▼
Standard   Intelligent-  Standard-IA  Glacier      Glacier       Express
           Tiering       or One Zone  Instant      Flexible/     One Zone
                                                   Deep Archive

Step 2: Encryption decision

                    Who manages the keys?
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
   AWS manages           You control          Keys never in AWS
   (no work)             (audit/rotate)           │
        │                     │           ┌───────┴───────┐
        ▼                     ▼           ▼               ▼
     SSE-S3              SSE-KMS      SSE-C         Client-Side
   (default)           (CloudTrail)  (key in       (encrypt before
                                     header)        upload)

Step 3: Feature-Based Decision Table

If question mentions…Answer is…
“unknown access pattern”Intelligent-Tiering
“millisecond retrieval from archive”Glacier Instant
“cheapest archive” / “rarely accessed”Glacier Deep Archive
“lowest latency” / “AI/ML training”Express One Zone
“recreatable data” + single AZ OKOne Zone-IA
“encrypt existing objects”S3 Batch Operations
“transition to cheaper storage”Lifecycle Rules
“delete old versions”Lifecycle Expiration Actions
“delete incomplete multipart uploads”Lifecycle Expiration Actions
“audit object access”S3 Access Logs + Athena
“customer manages keys outside AWS”SSE-C or Client-Side
“high throughput + encryption”SSE-S3 (not KMS — quota limits)
“prevent deletion for X years”Object Lock (Compliance)
“allow admin override”Object Lock (Governance)
“cross-account access”Bucket Policy
“generate temporary download link”Pre-Signed URL
“different access per team/prefix”S3 Access Points
“transform data before retrieval”S3 Object Lambda
“read first X bytes” / “file header”Byte-Range Fetch
“large file + unreliable network”Multi-Part Upload
“faster uploads over long distance”Transfer Acceleration
“analyze storage costs”S3 Storage Lens
“lifecycle recommendations”S3 Analytics
“replicate existing objects”S3 Batch Replication
“CORS error”Configure CORS on target bucket

The “NOT” Rules (Eliminate Wrong Answers Fast)

StatementWhy It’s Wrong
Lifecycle Rules encrypt objectsLifecycle = transition/delete only, not encrypt
SSE-KMS for high-throughputSSE-KMS has API quota limits — use SSE-S3
Replication for existing objectsOnly new objects — use Batch Replication for existing
Access Logs for real-time alertsAccess Logs = audit, not notifications — use Event Notifications
CloudTrail for data access patternsCloudTrail = API calls, not object-level access — use Access Logs
Object Lock without versioningVersioning is required before enabling Object Lock
Compliance mode with admin overrideCompliance = no one can override — use Governance for admin override
Glacier Flexible for instant accessFlexible = hours — use Glacier Instant for milliseconds
Standard-IA for archiveIA = infrequent access, not archive — use Glacier for archive

The “CANNOT” List

Cannot…Instead…
Create bucket with existing nameNames are globally unique — choose different name
Encrypt with Lifecycle RulesUse Batch Operations for encryption
Shorten Compliance retentionWait for expiry (truly immutable)
Delete in Compliance modeNo one can — not even root or AWS Support
Replicate to bucket without versioningEnable versioning on both buckets
Chain replications (A→B→C)Set up direct replication from A to C
Use SSE-C without HTTPSHTTPS is mandatory for SSE-C
Set Object Lock on bucket without versioningEnable versioning first

Part 3: Scenario Pattern Recognition

Pattern: “Unknown or changing access patterns”

Keywords: unpredictable access, varies over time, don’t know access frequency

Answer: Intelligent-Tiering

Why: Auto-moves objects between tiers, no retrieval fees, small monitoring fee.


Pattern: “Archive with occasional instant access”

Keywords: archive, quarterly access, millisecond retrieval, compliance archive

Answer: Glacier Instant Retrieval

Why: Archive pricing + instant access. Glacier Flexible = hours, not milliseconds.


Pattern: “Cheapest possible storage”

Keywords: rarely accessed, years of retention, 12+ hour retrieval OK

Answer: Glacier Deep Archive

Why: Cheapest class, 12-48 hour retrieval. Use Standard/Bulk retrieval.


Pattern: “Encrypt existing objects”

Keywords: encrypt all current files, change encryption, bulk encrypt

Answer: S3 Batch Operations

Why: Lifecycle Rules can’t encrypt. CRR creates copies. Batch Operations modifies in-place.


Pattern: “Delete old versions / incomplete uploads”

Keywords: reduce costs, clean up, delete versions older than X days, incomplete multipart

Answer: Lifecycle Expiration Actions

Why: Transition = move to cheaper class. Expiration = delete permanently.


Pattern: “Audit who accessed objects”

Keywords: audit access, security analysis, who accessed, access attempts

Answer: S3 Access Logs + Amazon Athena

Why: Access Logs capture all requests (including denied). Athena queries logs with SQL.


Pattern: “Customer manages encryption keys”

Keywords: customer-managed keys, keys not stored in AWS, full key control

Answer: SSE-C (if encryption in S3) or Client-Side (if encryption before upload)

Why: SSE-S3/SSE-KMS store keys in AWS. SSE-C/Client-Side = keys never stored in AWS.


Pattern: “Prevent anyone from deleting for X years”

Keywords: regulatory compliance, immutable, prevent deletion, WORM, SEC 17a-4

Answer: Object Lock in Compliance mode

Why: Compliance mode = truly immutable. No one (root, admin, AWS) can delete until retention expires.


Pattern: “Prevent deletion but allow admin override”

Keywords: internal policy, admin can override, flexible protection

Answer: Object Lock in Governance mode

Why: Users with s3:BypassGovernanceRetention permission can override. Compliance mode has no override.


Keywords: temporary access, time-limited URL, download link for logged-in users

Answer: Pre-Signed URL

Why: User inherits permissions of URL generator. Expires after set time (max 7 days via CLI).


Pattern: “Different teams need different bucket access”

Keywords: multiple teams, different prefixes, simplify access management

Answer: S3 Access Points

Why: Each Access Point has own policy, simplifies per-team access vs complex bucket policy.


Pattern: “Transform/redact data before retrieval”

Keywords: redact PII, convert format, resize images, enrich data on-the-fly

Answer: S3 Object Lambda

Why: Lambda transforms during GET request. No data duplication, no extra storage.


Pattern: “Faster uploads over long distances”

Keywords: global users, long-distance upload, slow uploads

Answer: S3 Transfer Acceleration

Why: Uses CloudFront edge locations. Combine with Multi-Part for large files.


Pattern: “Large file upload with unreliable network”

Keywords: large files, unstable connection, retry on failure

Answer: Multi-Part Upload

Why: Parallel upload, retry only failed parts. Required for >5GB files.


Pattern: “Read only beginning of file”

Keywords: file header, first N bytes, metadata extraction

Answer: Byte-Range Fetch

Why: Request specific byte ranges. Efficient for partial data retrieval.


Pattern: “CORS error when loading from S3”

Keywords: cross-origin, CORS error, browser blocking, different domain

Answer: Configure CORS on the target bucket (the one being requested)

Why: CORS is configured where the data is, not where the request originates.


Pattern: “Replicate existing objects”

Keywords: existing objects, current files, replicate everything

Answer: S3 Batch Replication

Why: Normal replication only copies new objects. Batch Replication handles existing.


Part 4: Quick Reference Tables

Storage Class Comparison

ClassAvail.AZsMin DurationRetrievalUse Case
Standard99.99%≥3-InstantFrequently accessed
Intelligent-Tiering99.9%≥3-InstantUnknown patterns
Standard-IA99.9%≥330 daysInstantInfrequent, rapid access
One Zone-IA99.5%130 daysInstantRecreatable data
Glacier Instant99.9%≥390 daysmsOnce/quarter access
Glacier Flexible99.99%≥390 days1min-12hrArchive, flexible
Glacier Deep Archive99.99%≥3180 days12-48hrLong-term archive
Express One Zone99.95%1-<10msAI/ML, lowest latency

Encryption Comparison

MethodKeys Managed ByKeys Stored In AWS?HTTPS Required?Quota Limits?
SSE-S3AWS✅ YesNo❌ No
SSE-KMSCustomer (via KMS)✅ YesNo✅ Yes (API quota)
DSSE-KMSCustomer (via KMS)✅ YesNo✅ Yes
SSE-CCustomer (external)❌ No✅ Yes (mandatory)❌ No
Client-SideCustomer (external)❌ NoNo❌ No

Object Lock Modes

ModeWho Can Delete?Shorten Retention?Override?Use Case
ComplianceNo one❌ Never❌ NeverRegulatory (SEC, FINRA)
GovernanceSpecial permission✅ Yes✅ With permissionInternal policies
Legal HoldNo one while activeN/A✅ Remove holdLitigation, investigations

Performance Limits

MetricLimit
Requests per prefix (PUT/POST/DELETE)3,500/sec
Requests per prefix (GET/HEAD)5,500/sec
Single PUT max size5 GB
Object max size5 TB
Multi-Part Upload max parts10,000
Multi-Part Upload min part size5 MB (except last)

Pre-Signed URL Expiration

MethodDefaultMaximum
S3 Console1-720 minutes12 hours
AWS CLI3600 seconds604800 seconds (7 days)

Key APIs to Remember

API/HeaderPurpose
x-amz-server-side-encryption: AES256SSE-S3
x-amz-server-side-encryption: aws:kmsSSE-KMS
x-amz-bypass-governance-retention: trueOverride Governance mode
aws:SecureTransportCondition for HTTPS enforcement

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“unknown access pattern”Intelligent-Tiering
“millisecond from archive”Glacier Instant
“cheapest archive”Glacier Deep Archive
“lowest latency” / “single-digit ms”Express One Zone
“recreatable” + “single AZ OK”One Zone-IA
“encrypt existing objects”S3 Batch Operations
“transition to cheaper”Lifecycle Transition Actions
“delete old versions”Lifecycle Expiration Actions
“incomplete multipart”Lifecycle Expiration Actions
“audit access” / “who accessed”S3 Access Logs + Athena
“keys never in AWS”SSE-C or Client-Side
“high throughput + encrypt”SSE-S3 (not KMS!)
“prevent deletion” + “compliance”Object Lock Compliance
“admin can override”Object Lock Governance
“temporary link”Pre-Signed URL
“per-team access”S3 Access Points
“transform before GET”S3 Object Lambda
“read first N bytes”Byte-Range Fetch
“large file upload”Multi-Part Upload
“faster long-distance”Transfer Acceleration
“storage cost analysis”S3 Storage Lens
“lifecycle recommendations”S3 Analytics
“replicate existing”S3 Batch Replication
“CORS error”Configure CORS on target bucket
“cross-account”Bucket Policy
“event on upload”Event Notifications (SNS/SQS/Lambda)
“infinite loop” + loggingLogging bucket ≠ monitored bucket
“can’t create bucket”Name already taken globally
“global but regional”Bucket created in region
“overwrite + immediate read”Always latest (strong consistency)
“eventual consistency S3”❌ Outdated — S3 is strongly consistent since 2020

Part 6: Elimination Checklist

When stuck between options, eliminate systematically:

□ Is it about ENCRYPTION?
  → Lifecycle Rules = ❌ Can't encrypt
  → Batch Operations = ✅ Can encrypt existing objects

□ Is it about DELETION PROTECTION?
  → Compliance mode = No one can delete/override
  → Governance mode = Admin can override with permission
  → Legal Hold = Indefinite, removable

□ Is it about ACCESS CONTROL?
  → Same account user = IAM Policy
  → EC2/Lambda = IAM Role
  → Cross-account = Bucket Policy
  → Different teams = Access Points

□ Is it about PERFORMANCE?
  → More throughput = Spread across prefixes
  → Large files = Multi-Part Upload
  → Long distance = Transfer Acceleration
  → Partial data = Byte-Range Fetch

□ Is it about STORAGE CLASS?
  → Unknown pattern = Intelligent-Tiering
  → Infrequent = Standard-IA or One Zone-IA
  → Archive (instant) = Glacier Instant
  → Archive (flexible) = Glacier Flexible
  → Archive (cheapest) = Glacier Deep Archive
  → Lowest latency = Express One Zone

□ Is it about AUDITING?
  → Who accessed objects = S3 Access Logs + Athena
  → API calls to S3 = CloudTrail
  → Storage analysis = S3 Storage Lens or Analytics

□ Is it about REPLICATION?
  → New objects = Standard Replication
  → Existing objects = Batch Replication
  → Versioning required = ✅ On both buckets

🏆 The Golden Rules

  1. SSE-S3 is default — all new objects encrypted automatically (since Jan 2023)
  2. SSE-KMS has limits — for high-throughput, use SSE-S3
  3. Lifecycle cannot encrypt — use Batch Operations for encryption
  4. Compliance mode = truly immutable — no one can delete, not even root
  5. Governance mode = admin override — with special permission
  6. Replication only copies new objects — use Batch Replication for existing
  7. Versioning required for Object Lock — enable versioning first
  8. DENY always wins — explicit deny in any policy blocks access
  9. Bucket names are globally unique — across all accounts, all regions
  10. Prefixes = parallelism — spread across prefixes for more throughput
  11. Multi-Part required >5GB — single PUT limited to 5GB
  12. Access Logs ≠ Monitored bucket — avoid infinite loop
  13. CORS on target bucket — configure where data is, not where request originates
  14. Pre-Signed URL max = 7 days (via CLI) — 12 hours via console
  15. Object Lambda = transform on GET — no data duplication

AWS Snow Family:

AWS Snow Family: highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS. Trying to resolve challenges like:

Data migration and Edge computing:

AWS OpsHub — GUI application to manage Snow Family devices (installed on your computer)

AWS OpsHub Management:

┌─────────────────────────────────────────────────────────┐
│  Your Computer                                          │
│  ┌───────────────────────────────────────────────────┐  │
│  │              AWS OpsHub (GUI)                     │  │
│  │  ┌─────────────┬─────────────┬─────────────────┐  │  │
│  │  │ Unlock &    │ Transfer    │ Launch EC2      │  │  │
│  │  │ Configure   │ Files       │ Manage Storage  │  │  │
│  │  └─────────────┴─────────────┴─────────────────┘  │  │
│  └───────────────────────────────────────────────────┘  │
│         │                                               │
│         │ Local connection (USB/Network)                │
│         ▼                                               │
│  ┌───────────────┐                                      │
│  │ Snow Device   │                                      │
│  │ (Snowcone /   │                                      │
│  │ Snowball Edge)│                                      │
│  └───────────────┘                                      │
└─────────────────────────────────────────────────────────┘

When to Use Snowball

Rule of thumb: If network transfer takes > 1 week → use Snowball

Data Size100 Mbps1 Gbps10 Gbps
10 TB12 days30 hours3 hours
100 TB124 days12 days30 hours
1 PB3 years124 days12 days
Direct Upload vs Snowball:

Direct:    Client ──── www (10Gbit/s) ────▶ S3 Bucket

Snowball:  Client ──▶ Snowball ──▶ [ship] ──▶ AWS ──▶ S3 Bucket
                      (local)                 (import)

Snowball Edge Computing

Edge Computing = process data where it’s created (before sending to cloud)

DeviceUse Case
Snowball Edge Storage OptimizedLarge data + some compute
Snowball Edge Compute OptimizedHeavy processing (ML, transcoding)

Use cases: Preprocess data, machine learning at edge, media transcoding

⚠️ Exam trap: “Large data + process while in transit” → Snowball Edge (not Snowcone)


Snowball to Glacier

⚠️ Exam trap: Snowball cannot import to Glacier directly

Snowball ──▶ Amazon S3 ──▶ (Lifecycle Policy) ──▶ Amazon Glacier

Amazon FSx

Amazon FSx = Launch 3rd party high-performance file systems on AWS (fully managed)

FSx for Windows File Server

FSx for Lustre

Lustre Deployment Options:

OptionReplicationPerformanceUse Case
Scratch❌ No (data lost if fails)6x faster (200 MBps/TiB)Short-term processing, cost optimized
Persistent✅ Within same AZStandardLong-term processing, sensitive data
FSx Lustre Deployment Options:

Scratch File System:                    Persistent File System:
┌─────────────────────────────┐         ┌─────────────────────────────┐
│ Region                      │         │ Region                      │
│  ┌─────────┐   ┌─────────┐  │         │  ┌─────────┐   ┌─────────┐  │
│  │  AZ 1   │   │  AZ 2   │  │         │  │  AZ 1   │   │  AZ 2   │  │
│  │Compute  │   │Compute  │  │         │  │Compute  │   │Compute  │  │
│  └────┬────┘   └────┬────┘  │         │  └────┬────┘   └────┬────┘  │
│       └─────┬───────┘       │         │       └─────┬───────┘       │
│            ENI              │         │            ENI              │
│             │               │         │             │               │
│        ┌────▼────┐          │         │        ┌────▼────┐          │
│        │  FSx    │──▶ S3    │         │        │  FSx    │──▶ S3    │
│        │(Scratch)│ (optional)│        │        │(Persist)│ (optional)│
│        └─────────┘          │         │        └─────────┘          │
│     (No replication)        │         │   (Replicated in AZ)        │
└─────────────────────────────┘         └─────────────────────────────┘

FSx for NetApp ONTAP

FSx for OpenZFS

FSx for NetApp ONTAP / OpenZFS - Compatible Clients:

                    ┌─────────────────────────┐
                    │ FSx NetApp ONTAP        │
                    │  (NFS, SMB, iSCSI)      │
                    │ ─────────────────────── │
                    │ FSx OpenZFS             │
                    │  (NFS v3/v4 only)       │
                    └───────────┬─────────────┘
                                │
       ┌────────────────────────┼────────────────────────┐
       ▼                        ▼                        ▼
┌─────────────┐     ┌─────────────────────┐     ┌──────────────┐
│EC2/ECS/EKS  │     │VMware/AppStream/    │     │On-premises   │
│             │     │WorkSpaces           │     │Server        │
└─────────────┘     └─────────────────────┘     └──────────────┘
  Linux/Win/Mac

FSx Comparison:

FSx TypeProtocolBest ForKey Feature
WindowsSMB, NTFSWindows workloadsAD integration, Multi-AZ
LustrePOSIXHPC, ML, LinuxS3 integration, sub-ms latency
NetApp ONTAPNFS, SMB, iSCSIMulti-OS, NAS migrationAuto-scaling, cloning
OpenZFSNFSZFS migration1M IOPS, <0.5ms latency, cloning

FSx Use Case Decision Tree:

ScenarioAnswer
Windows app needs shared storage + Active DirectoryFSx for Windows
HPC cluster needs fast shared storage + read from S3FSx for Lustre
ML training with large datasets in S3FSx for Lustre
Migrate existing Windows file server to AWSFSx for Windows
Migrate NetApp/NAS to AWSFSx for NetApp ONTAP
Need NFS + SMB + iSCSI on same file systemFSx for NetApp ONTAP
Migrate ZFS-based workloads to AWSFSx for OpenZFS
Need point-in-time cloning for testingNetApp ONTAP or OpenZFS
Short-term compute job, optimize costFSx Lustre Scratch
Long-term processing, data must survive failureFSx Lustre Persistent

⚠️ Exam traps:


AWS Storage Gateway

Bridge between on-premises and AWS cloud storage

AWS Storage Gateway Overview:

On-Premises                                         AWS Cloud
┌─────────────────────────────────────┐    ┌────────────────────────────────┐
│                                     │    │                                │
│ File Shares ──NFS/SMB──▶ File GW ───┼────┼──▶ S3 (excl. Glacier) ──▶ Glacier
│                         (cache)     │    │                                │
│                                     │    │                                │
│ App Server ──iSCSI────▶ Volume GW ──┼────┼──▶ S3 ──▶ EBS Snapshots       │
│                         (cache)     │    │                                │
│                                     │    │                                │
│ Backup App ──iSCSI VTL─▶ Tape GW ───┼────┼──▶ S3 (Tape Library) ──▶ Glacier
│                         (cache)     │    │                                │
└─────────────────────────────────────┘    └────────────────────────────────┘
              Encryption in Transit (Internet or Direct Connect)
Gateway TypeProtocolBackendUse Case
S3 File GatewayNFS, SMBS3 (Standard, IA, One Zone, Intelligent)Access S3 via file protocols, cached locally
FSx File GatewaySMBFSx for WindowsLow-latency access to FSx from on-prem
Volume GatewayiSCSIS3 + EBS snapshotsBlock storage backed by S3
Tape GatewayiSCSI (VTL)S3 + GlacierReplace physical tapes with cloud
S3 File Gateway:

On-Premises                              AWS Cloud
┌────────────────────┐          ┌─────────────────────────────────┐
│ App Server         │          │  S3 Standard / IA / One Zone-IA │
│      │             │   HTTPS  │  S3 Intelligent-Tiering         │
│      ▼             │          │           │                     │
│ S3 File Gateway ───┼──────────┼──────────▶│                     │
│  (NFS or SMB)      │          │           ▼ (Lifecycle Policy)  │
│  (local cache)     │          │      S3 Glacier                 │
└────────────────────┘          └─────────────────────────────────┘
Volume Gateway:

On-Premises                              AWS Cloud
┌────────────────────┐          ┌─────────────────────────────────┐
│ App Server         │   HTTPS  │                                 │
│      │             │          │      S3 Bucket                  │
│      ▼ iSCSI       │          │         │                       │
│ Volume Gateway ────┼──────────┼────────▶│                       │
│  (local cache)     │          │         ▼                       │
└────────────────────┘          │    EBS Snapshots                │
                                └─────────────────────────────────┘
Tape Gateway:

On-Premises                              AWS Cloud
┌────────────────────────┐      ┌─────────────────────────────────┐
│ Backup Server          │      │                                 │
│      │ iSCSI           │HTTPS │  Virtual Tapes ──▶ Archived Tapes
│      ▼                 │      │  (S3)              (Glacier)    │
│ ┌──────────┬─────────┐ │      │                                 │
│ │Media     │Tape     │ │      │                                 │
│ │Changer   │Drive    │─┼──────┼──────────────────────────────▶  │
│ └──────────┴─────────┘ │      │                                 │
│     Tape Gateway       │      │                                 │
└────────────────────────┘      └─────────────────────────────────┘

Volume Gateway Modes:

⚠️ Exam traps:


AWS Transfer Family

⚠️ Exam trap: TLS is NOT a supported protocol

AWS Transfer Family:

                     MS Active Directory / LDAP
                              │ authenticate
                              ▼
Users ──▶ Route 53 ──▶ ┌─────────────────────┐      ┌─────────────┐
(FTP      (optional)   │ Transfer for SFTP   │      │             │
client)                │ Transfer for FTPS   │──────▶  Amazon S3  │
                       │ Transfer for FTP    │      │             │
                       │ (VPC only)          │      │  Amazon EFS │
                       └─────────────────────┘      └─────────────┘
                                    │
                               IAM Role

AWS DataSync

DataSync: On-Premises to AWS

On-Premises                                    AWS Region
┌────────────────────────┐          ┌─────────────────────────────────┐
│                        │          │  AWS Storage Resources          │
│ NFS/SMB Server         │   TLS    │  ┌─────────┬─────────┬────────┐ │
│      │                 │          │  │S3       │S3 IA    │S3      │ │
│      ▼ NFS/SMB         │          │  │Standard │         │One Zone│ │
│ DataSync Agent ────────┼──────────┼─▶├─────────┼─────────┼────────┤ │
│                        │          │  │S3       │S3       │S3 Deep │ │
│ (or Snowcone with      │          │  │Intell.  │Glacier  │Archive │ │
│  agent pre-installed)  │          │  ├─────────┴─────────┴────────┤ │
└────────────────────────┘          │  │    EFS    │    FSx         │ │
                                    │  └───────────┴────────────────┘ │
                                    └─────────────────────────────────┘
DataSync: AWS to AWS (no agent needed)

┌─────────────┐                              ┌─────────────┐
│  Amazon S3  │                              │  Amazon S3  │
├─────────────┤         ┌──────────┐         ├─────────────┤
│  Amazon EFS │◀───────▶│ DataSync │◀───────▶│  Amazon EFS │
├─────────────┤         └──────────┘         ├─────────────┤
│  Amazon FSx │    (copy data + metadata)    │  Amazon FSx │
└─────────────┘                              └─────────────┘

⚠️ Exam traps:


DataSync vs Storage Gateway

AspectDataSyncStorage Gateway
PurposeOne-time or scheduled migration/syncOngoing hybrid access (bridge)
DirectionOn-prem → AWS, AWS → AWSOn-prem ↔ AWS (bidirectional access)
Use case“Move data to cloud”“Extend on-prem storage to cloud”
AgentYes (on-prem), No (AWS-to-AWS)VM appliance (always)
CachingNo local cacheYes, local cache for low latency
ProtocolNFS, SMB, HDFS, S3 APINFS, SMB, iSCSI

⚠️ Exam trap decision:


Storage Services Comparison

AWS Storage Cloud Native Options:

┌─────────────────┬─────────────────┬─────────────────┐
│     Block       │      File       │     Object      │
├─────────────────┼─────────────────┼─────────────────┤
│  Amazon EBS     │  Amazon EFS     │  Amazon S3      │
│  EC2 Instance   │  Amazon FSx     │  Amazon Glacier │
│  Store          │                 │                 │
└─────────────────┴─────────────────┴─────────────────┘
ServiceTypeUse Case
S3ObjectGeneral object storage
S3 GlacierObjectArchival
EBSBlockSingle EC2 instance storage
Instance StoreBlockEphemeral, high IOPS
EFSFile (NFS)Linux shared file system
FSx WindowsFile (SMB)Windows shared file system
FSx LustreFile (POSIX)HPC, ML, Linux
FSx NetApp ONTAPFile (multi)Multi-OS, NAS migration
FSx OpenZFSFile (NFS)ZFS migration
Storage GatewayHybridOn-prem ↔ AWS bridge
Transfer FamilyHybridFTP/SFTP to S3/EFS
DataSyncMigrationScheduled sync to AWS
Snow FamilyMigrationPhysical data transfer

Migration & Hybrid Services Decision Tree

ScenarioAnswer
Large data (>1 week to transfer), limited bandwidthSnowball Edge
Large data + need to process at edgeSnowball Edge Compute Optimized
Small data + limited connectivity + edge computeSnowcone
One-time migration from on-prem NFS/SMB to S3DataSync (with agent)
Scheduled/recurring sync from on-prem to AWSDataSync
Migrate S3 → EFS or S3 → FSxDataSync (no agent)
On-prem apps need ongoing NFS/SMB access to S3S3 File Gateway
On-prem apps need low-latency access to FSx WindowsFSx File Gateway
On-prem apps need iSCSI block storage backed by S3Volume Gateway
Replace physical tape backup with cloudTape Gateway
External users upload via FTP/SFTP to S3Transfer Family
Import data to GlacierSnowball → S3 → Lifecycle Policy

⚠️ Key differentiators:

AWS OpsHub is a software to manage Snow Family Devices.



🎯 MASTER SUMMARY: Storage Migration & Hybrid Services Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: Network Transfer Time = Decision Point

The fundamental question: How long to transfer over network?

100 TB at 1 Gbps = 12 days. Snowball wins.

Principle 2: Migration vs Ongoing Access

Two fundamentally different needs:

“Move to cloud” = migration. “Extend to cloud” = hybrid access.

Principle 3: Protocol Determines Service

What protocol do your applications use?

ProtocolAWS Service
NFS/SMB (file)Storage Gateway, DataSync, EFS, FSx
iSCSI (block)Volume Gateway, Tape Gateway
FTP/SFTP/FTPSTransfer Family
S3 API (object)Direct S3, DataSync

Principle 4: FSx = Third-Party File Systems on AWS

FSx is NOT a generic file system — it’s specific file system software:

Principle 5: Snowball → S3 First, Then Glacier

Snowball cannot import directly to Glacier.

This is a common exam trap.

Principle 6: Edge Computing = Process Where Data Lives

Snowball Edge isn’t just for transfer — it’s for computing at the edge:

Principle 7: Gateways Have Local Cache

Storage Gateway provides low-latency local access with cloud backing:

Principle 8: DataSync Preserves Metadata

DataSync keeps file permissions and metadata intact:


Part 2: Decision Tree (Follow Keywords → Find Answer)

Step 1: Physical or Network Transfer?

                    Network quality?
                          │
            ┌─────────────┴─────────────┐
            ▼                           ▼
    Good/Adequate                  Limited/Bad
    (< 1 week)                     (> 1 week)
            │                           │
            ▼                           ▼
   DataSync / Direct              Snow Family
                                        │
                              ┌─────────┴─────────┐
                              ▼                   ▼
                         Small data          Large data
                         (< 14 TB)           (up to 80 TB)
                              │                   │
                              ▼                   ▼
                          Snowcone         Snowball Edge

Step 2: Migration or Ongoing Access?

                    What's the need?
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
   One-time           Scheduled        Ongoing
   Migration           Sync            Access
        │                 │                 │
        ▼                 ▼                 ▼
   DataSync           DataSync        Storage Gateway
   Snowball                                 │
                              ┌─────────────┴─────────────┐
                              ▼                           ▼
                         File access              Block storage
                         (NFS/SMB)                  (iSCSI)
                              │                           │
                              ▼                           ▼
                    S3 File Gateway              Volume Gateway
                    FSx File Gateway              Tape Gateway

Step 3: Feature-Based Decision Table

If question mentions…Answer is…
“> 1 week transfer” / “limited bandwidth”Snowball Edge
“limited network + small data”Snowcone
“process data at edge” / “edge computing”Snowball Edge Compute Optimized
“migrate to S3/EFS/FSx” (one-time)DataSync
“scheduled sync” / “weekly backup to S3”DataSync
“on-prem NFS access to S3” (ongoing)S3 File Gateway
“on-prem access to FSx Windows”FSx File Gateway
“on-prem iSCSI block storage”Volume Gateway
“replace tape backup”Tape Gateway
“FTP/SFTP access to S3”Transfer Family
“Windows file share + AD”FSx for Windows
“HPC / ML + Linux + S3”FSx for Lustre
“multi-protocol (NFS + SMB + iSCSI)”FSx for NetApp ONTAP
“migrate ZFS workloads”FSx for OpenZFS
“import to Glacier”Snowball → S3 → Lifecycle
“short-term HPC, cost optimized”FSx Lustre Scratch
“data must persist + HPC”FSx Lustre Persistent
“point-in-time cloning”FSx NetApp ONTAP or OpenZFS

The “NOT” Rules (Eliminate Wrong Answers Fast)

StatementWhy It’s Wrong
Snowball imports to Glacier directlyMust go to S3 first, then Lifecycle to Glacier
DataSync for ongoing hybrid accessDataSync = migration/sync, not ongoing access
Storage Gateway for one-time migrationStorage Gateway = ongoing access, not migration tool
Transfer Family for internal appsTransfer Family = FTP for external users
TLS as Transfer Family protocolTLS is encryption, not a protocol — use SFTP/FTPS
FSx Lustre for Windows appsLustre = Linux/POSIX only
FSx for Windows without ADWindows File Server integrates with AD
OpenZFS for SMB accessOpenZFS = NFS only
DataSync to EBSEBS not supported — only S3, EFS, FSx
Snowcone for 50 TBSnowcone max = 14 TB — use Snowball Edge

⚠️ Exam trap — DataSync over Direct Connect (NFS → EFS):

The “CANNOT” List

Cannot…Instead…
Import Snowball to Glacier directlySnowball → S3 → Lifecycle → Glacier
Use DataSync to EBSUse EBS snapshots or block-level replication
Use Transfer Family with TLS protocolUse SFTP (SSH-based) or FTPS (FTP over TLS)
Access FSx Lustre from WindowsUse FSx for Windows or NetApp ONTAP
Use Snowcone for >14 TBUse Snowball Edge (up to 80 TB)
Run EC2 on SnowconeLimited compute — use Snowball Edge Compute
Use FTP without VPC (Transfer Family)FTP = VPC only; use SFTP/FTPS for public

Part 3: Scenario Pattern Recognition

Pattern: “Large data + limited/bad network”

Keywords: petabytes, limited bandwidth, weeks to transfer, offline, remote location

Answer: Snowball Edge

Why: Physical transfer bypasses network limitations. > 1 week transfer → Snowball.


Pattern: “Process data at remote location”

Keywords: edge computing, process before upload, ML at edge, trucks, ships, mining

Answer: Snowball Edge Compute Optimized

Why: Run EC2/Lambda locally, process data, then ship to AWS.


Pattern: “Small data + limited connectivity”

Keywords: small dataset, remote, portable, <14 TB

Answer: Snowcone

Why: Smallest Snow device (8-14 TB), portable, has DataSync agent pre-installed.


Pattern: “Migrate on-prem NFS/SMB to S3”

Keywords: migrate, one-time transfer, move to cloud, NFS to S3

Answer: DataSync (with agent)

Why: DataSync = migration tool. Preserves metadata. Scheduled or one-time.


Pattern: “Scheduled backup to S3/EFS/FSx”

Keywords: weekly sync, daily backup, recurring, scheduled

Answer: DataSync

Why: DataSync supports hourly/daily/weekly schedules.


Pattern: “On-prem apps need ongoing access to S3”

Keywords: hybrid, continuous access, extend storage, NFS/SMB to S3

Answer: S3 File Gateway

Why: Storage Gateway = ongoing hybrid access with local cache.


Pattern: “Replace physical tape backup”

Keywords: tape, VTL, virtual tape library, backup to cloud

Answer: Tape Gateway

Why: Presents virtual tapes via iSCSI, stores in S3/Glacier.


Pattern: “External users upload via FTP”

Keywords: FTP, SFTP, file transfer, external partners

Answer: AWS Transfer Family

Why: Managed FTP/SFTP/FTPS service to S3 or EFS.


Pattern: “Windows file share with Active Directory”

Keywords: Windows, SMB, NTFS, Active Directory, DFS

Answer: FSx for Windows File Server

Why: Fully managed Windows file system with AD integration.


Pattern: “HPC / ML with Linux cluster”

Keywords: HPC, high-performance computing, ML training, Linux, Lustre

Answer: FSx for Lustre

Why: Parallel file system, S3 integration, sub-ms latency, 100s GB/s.


Pattern: “Read from S3 as file system for HPC”

Keywords: S3 integration, lazy load, HPC reads from S3

Answer: FSx for Lustre

Why: Can mount S3 as file system, lazy-load data on access.


Pattern: “Multi-protocol (NFS + SMB + iSCSI)”

Keywords: NFS and SMB, multi-OS, migrate NAS

Answer: FSx for NetApp ONTAP

Why: Only FSx that supports all three protocols.


Pattern: “Migrate ZFS workloads to AWS”

Keywords: ZFS, OpenZFS, migrate ZFS

Answer: FSx for OpenZFS

Why: Managed OpenZFS, NFS protocol, snapshots, cloning.


Pattern: “Import data to Glacier”

Keywords: Snowball to Glacier, archive imported data

Answer: Snowball → S3 → S3 Lifecycle Policy → Glacier

Why: Snowball cannot import directly to Glacier.


Pattern: “Short-term HPC job, optimize cost”

Keywords: temporary processing, cost optimized, short-term

Answer: FSx for Lustre (Scratch)

Why: Scratch = no replication, 6x faster, cheaper. Data lost if fails.


Pattern: “Long-term processing, data must survive”

Keywords: persistent, data durability, long-term HPC

Answer: FSx for Lustre (Persistent)

Why: Replicated within AZ, data survives failures.


Part 4: Quick Reference Tables

Snow Family Comparison

DeviceStorageComputeUse Case
Snowcone8-14 TB2 vCPU, 4 GBSmall data, portable, DataSync agent
Snowball Edge Storage80 TB40 vCPU, 80 GBLarge data + some compute
Snowball Edge Compute42-80 TB104 vCPU, 416 GBHeavy processing at edge
Snowmobile100 PB-Discontinued

FSx Comparison

FSx TypeProtocolOSBest For
WindowsSMB, NTFSWindowsWindows apps, AD integration
LustrePOSIXLinuxHPC, ML, S3 integration
NetApp ONTAPNFS, SMB, iSCSIMulti-OSNAS migration, multi-protocol
OpenZFSNFSLinux/UnixZFS migration, cloning

Storage Gateway Types

Gateway TypeProtocolBackendUse Case
S3 File GatewayNFS, SMBS3File access to S3
FSx File GatewaySMBFSx WindowsLow-latency FSx access
Volume GatewayiSCSIS3 + EBSBlock storage to S3
Tape GatewayiSCSI (VTL)S3 + GlacierReplace physical tapes

DataSync vs Storage Gateway

AspectDataSyncStorage Gateway
PurposeMigration / SyncOngoing hybrid access
Use case“Move to cloud”“Extend to cloud”
CachingNoYes (low latency)
DirectionOne-way or scheduledBidirectional access
AgentYes (on-prem)VM appliance

Transfer Family Protocols

ProtocolEncryptionAccess
SFTPSSH-basedPublic or VPC
FTPSTLS-basedPublic or VPC
FTPNoneVPC only

⚠️ TLS is NOT a protocol — it’s encryption layer used BY FTPS

Migration Service Selection

ScenarioService
> 1 week transfer timeSnowball Edge
< 14 TB + limited networkSnowcone
One-time NFS/SMB → S3 migrationDataSync
Scheduled sync to S3/EFS/FSxDataSync
S3 → EFS or S3 → FSx migrationDataSync (no agent)
Ongoing NFS/SMB access to S3S3 File Gateway
FTP/SFTP uploads to S3Transfer Family
Replace tape backupTape Gateway
iSCSI block storage to S3Volume Gateway

Key Numbers to Remember

ItemValue
Snowcone storage8 TB HDD / 14 TB SSD
Snowball Edge Storage80 TB
Snowball Edge Compute42 TB HDD / 28 TB NVMe
DataSync throughputUp to 10 Gbps per agent
FSx Lustre throughput100s GB/s
FSx OpenZFS IOPS1,000,000 IOPS
Volume Gateway cacheLocal + S3

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“> 1 week transfer” / “bad network”Snowball Edge
“small data + remote”Snowcone
“edge computing” / “process at edge”Snowball Edge Compute
“migrate NFS/SMB to S3”DataSync
“scheduled sync to AWS”DataSync
“S3 → EFS” or “S3 → FSx”DataSync (no agent)
“on-prem NFS access to S3” (ongoing)S3 File Gateway
“on-prem access to FSx Windows”FSx File Gateway
“iSCSI block storage to cloud”Volume Gateway
“replace tape backup”Tape Gateway
“FTP/SFTP to S3”Transfer Family
“TLS protocol”❌ Wrong — use SFTP/FTPS
“Windows file share + AD”FSx for Windows
“HPC + Linux + S3”FSx for Lustre
“multi-protocol (NFS+SMB+iSCSI)”FSx for NetApp ONTAP
“migrate ZFS”FSx for OpenZFS
“Snowball → Glacier”S3 first → Lifecycle
“short-term HPC, cheap”FSx Lustre Scratch
“HPC data must persist”FSx Lustre Persistent
“point-in-time cloning”FSx NetApp ONTAP or OpenZFS
“Snowmobile”Discontinued — use multiple Snowball

Part 6: Elimination Checklist

When stuck between options, eliminate systematically:

□ Is network transfer > 1 week?
  → Yes = Snow Family (Snowball/Snowcone)
  → No = DataSync or direct transfer

□ Is it MIGRATION or ONGOING ACCESS?
  → Migration = DataSync, Snowball
  → Ongoing = Storage Gateway

□ What PROTOCOL do apps use?
  → NFS/SMB (file) = File Gateway, DataSync, FSx
  → iSCSI (block) = Volume Gateway, Tape Gateway
  → FTP/SFTP = Transfer Family

□ Is it WINDOWS or LINUX?
  → Windows + AD = FSx for Windows
  → Linux + HPC = FSx for Lustre
  → Both = FSx for NetApp ONTAP

□ Do they need EDGE COMPUTING?
  → Yes + small = Snowcone (limited)
  → Yes + heavy = Snowball Edge Compute

□ Is data going to GLACIER?
  → Via Snowball = S3 first → Lifecycle → Glacier
  → Direct = S3 Lifecycle Policy

□ Is it SCHEDULED SYNC?
  → Yes = DataSync (hourly/daily/weekly)
  → No = One-time DataSync or Snowball

□ Do they need LOCAL CACHE?
  → Yes = Storage Gateway
  → No = DataSync or direct

□ Is it for EXTERNAL USERS?
  → FTP/SFTP = Transfer Family
  → Internal apps = Storage Gateway

🏆 The Golden Rules

  1. > 1 week transfer = Snowball — physical beats network
  2. Migration = DataSync, Ongoing = Storage Gateway — different tools for different needs
  3. Snowball can’t import to Glacier directly — S3 first, then Lifecycle
  4. TLS is NOT a Transfer Family protocol — SFTP/FTPS use TLS, but “TLS” isn’t a protocol
  5. FTP = VPC only — SFTP/FTPS can be public
  6. FSx for Windows = AD integration — always mention AD for Windows file shares
  7. FSx for Lustre = HPC + Linux + S3 — the HPC file system
  8. FSx NetApp ONTAP = multi-protocol — only one with NFS + SMB + iSCSI
  9. OpenZFS = NFS only — no SMB, no iSCSI
  10. Lustre Scratch = temporary, fast, cheap — data lost on failure
  11. Lustre Persistent = durable — replicated within AZ
  12. Snowcone max = 14 TB — use Snowball Edge for larger
  13. Snowmobile is discontinued — use multiple Snowball Edge instead
  14. DataSync preserves metadata — timestamps, permissions, ownership
  15. Storage Gateway has local cache — low-latency hybrid access
  16. EBS is NOT a DataSync destination — only S3, EFS, FSx

Databases:

        ┌─────┐ ┌─────┐ ┌─────┐
        │User │ │User │ │User │
        └──┬──┘ └──┬──┘ └──┬──┘
           │       │       │
           └───────┼───────┘
                   ▼
        ┌─────────────────────┐
        │    Application      │
        └──────────┬──────────┘
                   │ Read/Write
                   ▼
            ┌────────────┐
            │ Amazon RDS │
            └──────┬─────┘
                   │
                   ▼
     <────────── Storage ──────────>

RDS (Relational Database Service) is a distributed relational database service (SQL).

Supported Engines
PostgreSQL, MySQL, MariaDB, Oracle, MS SQL Server, IBM DB2, Aurora
                      ┌───────────────────┐
                      │    Application    │
                      └────────┬─┬────────┘
                        writes ↓ ↑ reads
                           ┌────┴────┐
                           │    M    │  ← Master (writes + reads)
                           └────┬────┘
                    ASYNC       │       ASYNC
               replication ←────┴────→ replication
              ┌─────┴─────┐       ┌─────┴─────┐
              │     R     │       │     R     │  ← Read Replicas
              └─────┬─────┘       └─────┬─────┘
                    ↑ reads             ↑ reads

Read Replicas: Up to 15 replicas, ASYNC replication (eventually consistent), can be cross-AZ/cross-Region.

⚠️ Exam trap: ASYNC = eventual consistency = replication lag

Read Replica Network Cost:

┌─────────────────────────────┐      ┌────────────────────────────┐
│ Same Region / Different AZ  │      │        Cross-Region        │
│  us-east-1a    us-east-1b   │      │ us-east-1a    eu-west-1b   │
│   ┌───┐   ASYNC  ┌───┐      │  vs  │   ┌───┐   ASYNC  ┌───┐     │
│   │ M │ ───────→ │ R │      │      │   │ M │ ───────→ │ R │     │
│   └───┘          └───┘      │      │   └───┘          └───┘     │
│    FREE (same region)       │      │     $$$ (cross-region)     │
└─────────────────────────────┘      └────────────────────────────┘

⚠️ Exam trap: Same region replication = FREE, Cross-region = costs $$$

RDS Cross-Region DR Strategy:

RDS Multi-AZ (Disaster Recovery):

              ┌───────────────────┐
              │    Application    │
              └────────┬─┬────────┘
                writes ↓ ↑ reads
    ┌─────────────────────────────────────┐
    │   One DNS name – automatic failover │
    └──────────────────┬──────────────────┘
                       │
         ┌─────────────┴─────────────┐
         ▼                           │
    ┌─────────┐       SYNC      ┌────┴────┐
    │    M    │ ──────────────→ │    S    │
    └─────────┘   replication   └─────────┘
    Master (AZ A)              Standby (AZ B)

⚠️ Exam trap: Multi-AZ = High Availability (failover), Read Replicas = Scalability (read performance)

 READ REPLICA (ASYNC)                MULTI-AZ (SYNC)
 ┌───┐         ┌───┐               ┌───┐         ┌───┐
 │ M │ ──────→ │ R │               │ M │ ──────→ │ S │
 └───┘  async  └───┘               └───┘  sync   └───┘
       (lag OK)                         (no lag!)
"eventually consistent"            "always consistent"

Single-AZ → Multi-AZ Migration (zero downtime):

┌─────────┐   SYNC replication   ┌─────────┐
│    M    │ ──────────────────→  │    S    │
└────┬────┘                      └─────────┘
     │                           Standby DB
     ↓ snapshot
┌─────────-┐
│    DB    |
| snapshot │ ← restore to new AZ
└─────────-┘
  1. Click “modify” on DB (no downtime)
  2. Snapshot taken automatically
  3. New standby restored from snapshot in different AZ
  4. SYNC replication established

Use Case: Reporting without impacting production

┌────────────────┐               ┌────────────────┐
│   Production   │               │    Reporting   │
│   Application  │               │   Application  │
└─────-┬─┬───────┘               └───────┬────────┘
       ↓ ↑                               ↑ reads
   writes/reads                          │
        │                                │
   ┌────┴────┐   ASYNC replication  ┌----┴────┐
   │    M    │ ───────────────────→ │    R    │
   └─────────┘                      └─────────┘
   RDS Master                      Read Replica

RDS Storage Auto Scaling:

Why RDS over EC2-hosted DB?

RDS Manages For YouYou Still Control
OS patchingDatabase schema
Automated backups (Point in Time Restore)Application queries
Monitoring dashboardsSecurity groups
Hardware provisioningParameter groups
Read replicas & Multi-AZ setup
Storage scaling (EBS-backed)

⚠️ Exam trap: You can’t SSH into RDS instances (except RDS Custom for Oracle/SQL Server).

RDS Custom (Oracle & MS SQL Server only):

        ┌───────┐
        │ User  │
        └───┬───┘
   apply    │    SSH / SSM
   customs  │
            ▼
    ┌───────────────┐
    │ EC2 Instance  │
    ├───────────────┤
    │  Amazon RDS   │  Automation Mode: DISABLED
    └───────────────┘
RDSRDS Custom
AWS manages OS + DBFull admin access to OS + DB
No SSHSSH / SSM Session Manager
No custom patchesInstall patches, configure settings

⚠️ Disable Automation Mode before customizing. Take snapshot first!

⚠️ Exam trap: “Full customization of Oracle/SQL Server” + “benefit from AWS services” = RDS Custom

Amazon Aurora is AWS cloud optimized (5x faster than MySQL, 3x faster than PostgreSQL on RDS) an enterprise-class relational database, proprietary technology from AWS (not open source). Automatically growing storage. Costs more than RDS on 20%, but it’s more efficient, Amazon Aurora helps to reduce your database costs by reducing unnecessary input/output (I/O) operations, while ensuring that your database resources remain reliable and available.
Amazon Aurora replicates six copies of your data across three Availability Zones and continuously backs up your data to Amazon S3.

FeatureDetails
EnginesPostgreSQL, MySQL (compatible drivers)
Performance5x MySQL, 3x PostgreSQL on RDS
StorageAuto-grows 10GB → 128TB
ReplicasUp to 15, <10ms replica lag
FailoverInstantaneous (HA native)
Cost20% more than RDS, but more efficient
Durability6 copies across 3 AZs, continuous backup to S3

⚠️ Exam trap: “OLTP” + “auto-scaling storage” + “maximum replicas” = Aurora

Aurora High Availability:

       AZ 1           AZ 2           AZ 3
    ┌───┐ ┌───┐    ┌───┐ ┌───┐    ┌───┐ ┌───┐
    │ M │ │ R │    │ R │ │ R │    │ R │ │ R │
    └─┬─┘ └─┬─┘    └─┬─┘ └─┬─┘    └─┬─┘ └─┬─┘
      ↓W    ↑R       ↑R    ↑R       ↑R    ↑R
    ══════════════════════════════════════════
         Shared Storage Volume (100s of volumes)
         Replication + Self Healing + Auto Expanding
    ══════════════════════════════════════════

Aurora Quorum (failure tolerance):

ScenarioWritesReads
1 AZ down (2 copies lost)✅ Works (4 remaining)✅ Works
3 copies lost✅ Works✅ Works
4+ copies lost❌ Write outage❌ Read outage

Aurora DB Cluster Endpoints:

                    ┌──────────┐
                    │  Client  │
                    └────┬─────┘
           ┌─────────────┴──────--───────┐
           ▼                             ▼
┌─────────────────────┐    ┌────────────────────────────┐
│  Writer Endpoint    │    │     Reader Endpoint        │
│ (points to master)  │    │ (load balances to replicas)│
└──────────┬──────────┘    └─────────────┬──────────────┘
           │                    ┌────────┼────────┐
           ▼                    ▼        ▼        ▼
       ┌───────┐           ┌───────┐ ┌───────┐ ┌───────┐
       │   M   │←──────────│   R   │ │   R   │ │   R   │ ← Auto Scaling
       └───┬───┘           └───┬───┘ └───┬───┘ └───┬───┘
           ↓W                  ↑R        ↑R        ↑R
    ════════════════════════════════════════════════════
           Shared Storage (10GB → 128TB auto-expanding)
    ════════════════════════════════════════════════════

⚠️ Exam trap — “Separate reads from writes” in Aurora:

Aurora Features:

Aurora Replicas Auto Scaling:

                         ┌──────────┐
                         │  Client  │
                         └────┬─────┘
              ┌───────────────┴───────────────┐
              ▼                               ▼ Many Requests
   ┌─────────────────────┐       ┌────────────────────────────┐
   │  Writer Endpoint    │       │     Reader Endpoint        │
   └──────────┬──────────┘       └─────────────┬──────────────┘
              │                       ┌────────┼────────┐
              ▼                       ▼        ▼        ▼
          ┌───────┐              ┌───────┐ ┌───────┐ ┌───────┐
          │   M   │              │   R   │ │   R   │ │   R   │ ← Added by
          └───┬───┘   CPU↑  CPU↑ └───┬───┘ └───┬───┘ └───┬───┘   Auto Scaling
              ↓W                     ↑R        ↑R        ↑R
    ════════════════════════════════════════════════════════════
           Shared Storage (10GB → 128TB auto-expanding)
    ════════════════════════════════════════════════════════════

Aurora Custom Endpoints:

                            ┌──────────┐
                            │  Client  │
                            └────┬─────┘
         ┌───────────────--──────┼──────────────────────┐
         ▼                       ▼                      ▼
┌─────────────────┐     ┌─────────────────┐    ┌──────────────────┐
│ Writer Endpoint │     │ Reader Endpoint │    │ Custom Endpoint  │
└────────┬────────┘     └────────┬────────┘    │(Analytical Query)│
         │                       │             └────────┬─────────┘
         ▼                       ▼                      ▼
     ┌───────┐          ┌───────┐ ┌───────┐    ┌───────┐ ┌───────┐
     │   M   │          │   R   │ │   R   │    │   R   │ │   R   │
     └───┬───┘          └───────┘ └───────┘    └───────┘ └───────┘
         ↓W             db.r3.large (small)    db.r5.2xlarge (large)
    ════════════════════════════════════════════════════════════════
                      Shared Storage Volume
    ════════════════════════════════════════════════════════════════

Aurora Serverless:

                    ┌──────────┐
                    │  Client  │
                    └────┬─────┘
                         │
           ┌─────────────────────────────┐
           │       Proxy Fleet           │
           │    (managed by Aurora)      │
           └──────────────┬──────────────┘
                ┌────┬────┼────┬────┐
                ▼    ▼    ▼    ▼    ▼
              ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐  ← Auto-scales
              │DB│ │DB│ │DB│ │DB│ │DB│    based on load
              └──┘ └──┘ └──┘ └──┘ └──┘
    ════════════════════════════════════════════
              Shared Storage Volume
    ════════════════════════════════════════════

⚠️ Exam trap: “Dev/test environment” + “unused most of time” + “minimize costs” = Aurora Serverless

Aurora Global Database:

┌──────────────────────────────────────────┐
│        us-east-1 (PRIMARY REGION)        │
│  ┌─────────────┐       ┌─────────────┐   │
│  │ Application │       │   Aurora    │   │
│  │ Read/Write  │ ────→ │   Primary   │   │
│  └─────────────┘       └──────┬──────┘   │
└───────────────────────────────┼──────────┘
                                │ replication
                                │ (<1 second)
┌───────────────────────────────┼──────────┐
│        eu-west-1 (SECONDARY REGION)      │
│  ┌─────────────┐       ┌──────┴──────┐   │
│  │ Application │       │   Aurora    │   │
│  │  Read Only  │ ←──── │  Secondary  │   │
│  └─────────────┘       └─────────────┘   │
└──────────────────────────────────────────┘
FeatureDetails
Primary Region1 (read/write)
Secondary RegionsUp to 5 (read-only)
Replicas per RegionUp to 16
Replication Lag<1 second
DR Promotion RTO<1 minute

⚠️ Exam trap: “Cross-region Disaster Recovery” or “replica in another region” = Aurora Global Database

Aurora Machine Learning:

              ┌─────────────┐
              │ Application │
              └──────┬──────┘
       SQL query     │     query results
  (recommendations?) │  (red shirt, blue...)
                     ▼
              ┌─────────────┐
              │   Aurora    │
              └──────┬──────┘
          data       │        predictions
   (user profile,    │     (red shirt,
    shopping...)     │      blue pants...)
            ┌────────┴────────┐
            ▼                 ▼
     ┌────────────┐    ┌─────────────┐
     │ SageMaker  │    │ Comprehend  │
     │ (any ML)   │    │ (sentiment) │
     └────────────┘    └─────────────┘

Babelfish for Aurora PostgreSQL:

 ┌─────────────────┐            ┌─────────────────┐
 │   Application   │            │   Application   │
 │    SQL Server   │            │    PostgreSQL   │
 │  Client Driver  │            │      Driver     │
 └────────┬────────┘            └────────┬────────┘
          │ T-SQL                        │ PL/pgSQL
          │                              │
          │    ┌────────────────────┐    │
          │    │ Aurora PostgreSQL  │    │
          │    ├─────────┬──────────┤    │
          └───→│Babelfish│PostgreSQL│←───┘
               └─────────┴──────────┘
                         ↑
                      migrate
                         │
                 ┌───────────────┐
                 │     MS SQL    │
                 │     Server    │
                 └───────────────┘

RDS & Aurora Backups:

RDSAurora
Automated Backups1-35 days (0 = disable retention)1-35 days (cannot disable)
Transaction LogsEvery 5 minContinuous
Point-in-Time RecoveryUp to 5 min agoWithin retention window
Manual Snapshots (On-Demand)Unlimited retentionUnlimited retention

⚠️ Exam traps:

RDS & Aurora Restore Options:

Aurora Database Cloning:

CLONING (instant)                    SNAPSHOT/RESTORE (slow)
┌─────────────┐                      ┌─────────────┐
│  Production │                      │  Production │
└──────┬──────┘                      └──────┬──────┘
       │ shared storage                     │ snapshot
       ▼ (no copy!)                         ▼ (copy all!)
┌─────────────┐                      ┌─────────────┐
│    Clone    │                      │   New DB    │
└─────────────┘                      └─────────────┘
  Only new writes                      Full duplicate
  use extra storage                    storage cost

⚠️ Exam trap: “Need production data ASAP” + “read/write tests” = Aurora Cloning (instant)

RDS & Aurora Security:

Security LayerDetails
At-rest encryptionAWS KMS, must enable at launch time
In-flight encryptionTLS by default
IAM AuthenticationIAM roles instead of username/password
Security GroupsControl network access
Audit LogsSend to CloudWatch Logs

⚠️ Exam traps:

⚠️ Exam trap - “End-to-end security for data-in-transit to RDS”:

Amazon RDS Proxy:

┌─────────────────────────────────────────────────┐
│                      VPC                        │
│  ┌───────────────────────────────────────────┐  │
│  │         Lambda functions                  │  │
│  │    λ    λ    λ    λ    λ    ...           │  │
│  └───────────────────┬───────────────────────┘  │
│                      │ IAM Authentication       │
│  ┌───────────────────┼───────────────────────┐  │
│  │           Private subnet                  │  │
│  │                   ▼                       │  │
│  │           ┌─────────────┐                 │  │
│  │           │  RDS Proxy  │ ← Connection    │  │
│  │           └──────┬──────┘   Pooling       │  │
│  │                  ▼                        │  │
│  │           ┌─────────────┐                 │  │
│  │           │ RDS / Aurora│                 │  │
│  │           └─────────────┘                 │  │
│  └───────────────────────────────────────────┘  │
└─────────────────────────────────────────────────┘
FeatureDetails
Connection PoolingReduces DB stress (CPU, RAM, connections)
FailoverReduces RDS/Aurora failover by 66%
SupportsRDS (MySQL, PostgreSQL, MariaDB, MS SQL), Aurora
SecurityIAM Auth, credentials in Secrets Manager
AccessNever publicly accessible (VPC only)

⚠️ Exam trap: “Many EC2s” + “slow reconnection after failover” = RDS Proxy


RDS & Aurora Lambda Integration:

Two Different Ways to Connect Lambda with RDS/Aurora:

AspectRDS Event NotificationsInvoke Lambda from RDS/Aurora
SetupAWS Console (RDS settings)Inside the database (SQL)
Access to DB Data❌ No (metadata only)✅ Yes (full data access)
Trigger SourceDB instance eventsData changes (triggers)
Use CaseDB state changes (failover, snapshot)React to data (new row, update)
EnginesAll RDS enginesAurora MySQL, Aurora PostgreSQL

RDS Event Notifications:

RDS Event Notifications Flow:
RDS Instance ──► RDS Event ──► SNS Topic ──► Lambda
(state change)   Subscription               (no DB data)

Invoke Lambda from RDS/Aurora:

Invoke Lambda from Aurora:
App ──► Aurora ──► Trigger/Stored Proc ──► Lambda ──► External Service
        (data)    (calls Lambda)           (has data)  (notifications, etc)

⚠️ Exam trap: “React to DB failover/snapshot events” → RDS Event Notifications (via SNS). ⚠️ Exam trap: “Process data when inserted/updated” → Invoke Lambda from Aurora (configured in DB).


Amazon ElastiCache managed Redis or Memcached in-memory databases with high performance and low latency.

⚠️ Exam trap: Using ElastiCache requires heavy application code changes

ElastiCache - DB Cache Pattern:

                      ┌───────────────────┐
                      │   ElastiCache     │
           Cache hit  │                   │
         ←────────────│   ┌───────────┐   │
         ─────────────│──→│   Cache   │   │
                      │   └───────────┘   │
┌─────────────┐       └─────────┬─────────┘
│ Application │                 │ Cache miss
└──────┬──────┘                 │
       │                        ▼
       │ Read from DB    ┌─────────────┐
       └────────────────→│  Amazon RDS │
       ←─────────────────└─────────────┘
       │
       └──→ Write to cache

ElastiCache - User Session Store:

        ┌──────┐
        │ User │
        └──┬───┘
           │
     ┌─────┴─────┬────────────┐
     ▼           ▼            ▼
┌─────────┐ ┌─────────┐  ┌─────────┐
│   App   │ │   App   │  │   App   │
└────┬────┘ └────┬────┘  └────┬────┘
     │           │            │
     │  Write    │  Retrieve  │
     │  session  │  session   │
     │           │            │
     └───────────┴────────────┘
                 │
                 ▼
          ┌─────────────┐
          │ ElastiCache │
          └─────────────┘

⚠️ Exam trap: “Users keep logging out” + ALB + Auto Scaling = ElastiCache for sessions

ElastiCache - Redis vs Memcached:

FeatureRedisMemcached
High AvailabilityMulti-AZ with Auto-Failover❌ No HA
Read Replicas✅ Yes (scale reads)❌ No
Persistence✅ AOF (durable)❌ Non-persistent
Backup/Restore✅ YesServerless only
Data StructuresSets, Sorted SetsSimple key-value
ArchitectureReplicationSharding (multi-node)
ThreadingSingle-threadedMulti-threaded
    REDIS (HA + Durability)          MEMCACHED (Sharding)
    ┌───┐  Replication  ┌───┐        ┌───┐    +    ┌───┐
    │ R │ ────────────→ │ R │        │ M │ shards  │ M │
    └───┘               └───┘        └───┘         └───┘

⚠️ Exam trap:

ElastiCache - Security:

┌───────────────────────┐
│  EC2 Security Group   │
│       ┌─────┐         │
│       │ EC2 │ Client  │
│       └──┬──┘         │
└──────────┼────────────┘
           │ SSL encryption
           │ Redis AUTH
           ▼
┌───────────────────────┐
│  Redis Security Group │
│        ┌─────┐        │
│        │Redis│        │
│        └─────┘        │
└───────────────────────┘
EngineAuthenticationNotes
RedisIAM AuthenticationFor Redis only
RedisRedis AUTHPassword/token at cluster creation
RedisSSL/TLSIn-flight encryption
MemcachedSASL-basedAdvanced auth

⚠️ Exam trap:

ElastiCache - Caching Patterns:

   LAZY LOADING                         WRITE THROUGH
   ┌─────────┐                          ┌─────────┐
   │   App   │                          │   App   │
   └────┬────┘                          └────┬────┘
        │ 1. Cache hit? ←───┐                │
        ▼                   │                │ 1. Write to DB
   ┌─────────┐         ┌────┴────┐      ┌────┴────┐
   │  Cache  │         │  Cache  │      │   RDS   │
   └────┬────┘         └─────────┘      └────┬────┘
        │ 2. Miss                            │ 2. Write to cache
        ▼                                    ▼ 
   ┌─────────┐                          ┌─────────┐
   │   RDS   │                          │  Cache  │
   └────┬────┘                          └─────────┘
        │ 3. Write to cache
        ▼
   ┌─────────┐
   │  Cache  │
   └─────────┘
PatternDescriptionTrade-off
Lazy LoadingCache on read (miss → fetch → cache)Data can become stale
Write ThroughCache on write (DB + cache updated together)No stale data, more writes
Session StoreStore temp session data with TTLSessions auto-expire

ElastiCache - Redis Use Case (Gaming Leaderboards):

                ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐       ┌─────────────────────┐
                  ElastiCache Redis         │ Real-time           │
  ┌─────────┐   │                   │       │ Leaderboard         │
  │ Clients │──→   ┌─────┐ ┌─────┐   ──────→│ ┌─────────────┐     │
  └─────────┘   │  │Redis│ │Redis│  │       │ │ 1. Player A │     │
                   └─────┘ └─────┘          │ │ 2. Player B │     │
                │  ┌─────┐          │       │ │ 3. Player C │     │
                   │Redis│                  │ └─────────────┘     │
                │  └─────┘          │       └─────────────────────┘
                └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘

⚠️ Exam trap: “Real-time leaderboard” computationally complex without Redis Sorted Sets (not Memcached — no sorted sets!)

DynamoDB fully managed highly available (with replication across 3 AZ), NoSQL (key/value) database that scales to massive workloads and single-digit millisecond latency.
DynamoDB Accelerator - DAX fully managed in-memory cache for DynamoDB (x10 performance improvement). Like ElastiCache, but only for DynamoDB.

Amazon Redshift is a fully managed OLAP (Online Analytical Processing) data warehouse for PB-scale analytics.

Redshift Cluster Architecture:

Query (JDBC/ODBC)
       │
       ▼
┌─────────────────────────────┐
│   Amazon Redshift Cluster   │
│  ┌───────────────────────┐  │
│  │     Leader Node       │  │  ← Query planning, results aggregation
│  └───────────┬───────────┘  │
│       ┌──────┼──────┐       │
│       ▼      ▼      ▼       │
│  ┌────────┐┌────────┐┌────────┐
│  │Compute ││Compute ││Compute │  ← Perform queries, send to leader
│  │ Node   ││ Node   ││ Node   │
│  └────────┘└────────┘└────────┘
└─────────────────────────────┘

Redshift Modes:

ModeDescriptionCost Model
ProvisionedChoose instance types upfrontReserved instances for savings
ServerlessAuto-scales, no managementPay per use

Loading Data into Redshift:

MethodDescriptionBest For
Kinesis FirehoseStream → S3 → Redshift (COPY)Real-time streaming
S3 COPY commandBulk load from S3Large batch imports
EC2 JDBC driverInsert via applicationSmall batches (less efficient)

⚠️ Exam trap: “Load data into Redshift” → Large inserts are MUCH better. Use S3 COPY or Firehose, not row-by-row JDBC inserts.

Enhanced VPC Routing:

⚠️ Exam trap: “COPY/UNLOAD through VPC” or “Redshift traffic stays in VPC” → Enhanced VPC Routing. “Improved VPC Routing” doesn’t exist!

Redshift Spectrum:

Redshift Spectrum:
Query ──► Redshift Cluster ──► Spectrum Nodes (1000s) ──► S3 Bucket
          (Leader + Compute)      (query S3 directly)

Redshift vs Athena:

AspectRedshiftAthena
TypeData warehouseQuery service
InfrastructureCluster (Provisioned/Serverless)Fully serverless
Best forComplex joins, aggregations, dashboardsAd-hoc queries on S3
PerformanceFaster (indexes, columnar)Slower (full S3 scan)
Data locationLoaded into RedshiftStays in S3
Cost modelCluster time$5/TB scanned

⚠️ Exam trap: “Faster joins/aggregations” or “BI dashboards on data warehouse” → Redshift. “Serverless ad-hoc S3 queries” → Athena.

Redshift Snapshots & DR:

Snapshot TypeFrequencyRetention
AutomatedEvery 8 hours or 5 GB1-35 days
ManualOn-demandUntil you delete

Cross-Region DR:

Cross-Region Snapshot Copy:
Region A                          Region B
┌─────────────┐   Auto Copy   ┌──────────────┐
│  Redshift   │──────────────►│   Snapshot   │
│  Cluster    │               │   (copied)   │
└──────┬──────┘               └──────┬───────┘
       │ Snapshot                    │ Restore
       ▼                             ▼
┌─────────────┐               ┌──────────────┐
│  Snapshot   │               │  New Cluster │
│  (original) │               │  (DR region) │
└─────────────┘               └──────────────┘

⚠️ Exam trap: “Redshift cross-region DR” → Cross-region snapshot copy. Restore snapshot in target region.

⚠️ Exam trap: “Redshift Global cluster” → Doesn’t exist! Aurora has Global Database, Redshift uses cross-region snapshot copy instead.

⚠️ Exam trap: Redshift vs Athena vs EMR:

Amazon Elastic MapReduce (EMR) = managed Hadoop clusters for big data processing.

EMR Node Types:

Node TypePurposeLifecycle
Master NodeManage cluster, coordinate, healthLong-running
Core NodeRun tasks + store dataLong-running
Task NodeRun tasks only (no storage)Usually Spot

EMR Purchasing Options:

OptionUse Case
On-DemandReliable, won’t be terminated
ReservedCost savings (min 1 year), auto-used if available
SpotCheaper, can be terminated (for Task Nodes)

Cluster Types:

⚠️ Exam trap: “Cost-optimize EMR” → Use Spot for Task Nodes (can lose them), Reserved/On-Demand for Master/Core (need reliability).

⚠️ Exam trap: EMR vs Athena vs Redshift:

Amazon Athena serverless SQL query service to analyze data stored in Amazon S3.

Athena Use Cases:

Athena Architecture:
Users ──► S3 Bucket ──► Amazon Athena ──► Amazon QuickSight
          (data)       (Query & Analyze) (Reporting & Dashboards)

Athena Federated Query:

Federated Query:
                        ┌─► S3 Bucket
                        ├─► ElastiCache
                        ├─► DocumentDB
Amazon Athena ◄─────────┼─► DynamoDB        ◄── Lambda (Data Source Connector)
                        ├─► Redshift
                        ├─► Aurora/RDS
                        ├─► HBase in EMR
                        └─► On-Premises DB

Athena Performance Optimization:

OptimizationWhy
Columnar format (Parquet/ORC)Scan less data → lower cost
Glue ETLConvert CSV/JSON to Parquet/ORC
Compress dataSmaller scans (gzip, snappy, lz4, zstd)
Partition datasetsQuery specific partitions only
Large files (> 128 MB)Minimize overhead

S3 Partitioning Example:

s3://bucket/table/year=1991/month=1/day=1/data.parquet
                   └── partition columns as virtual columns

⚠️ Exam trap: “Analyze data in S3 using serverless SQL” → Athena. Not Redshift (requires provisioning), not EMR (requires cluster).

⚠️ Exam trap: “Reduce Athena costs” → Parquet/ORC (columnar = scan less). Glue can convert formats.

⚠️ Exam trap: “Query multiple data sources with SQL” → Athena Federated Query (uses Lambda connectors).

Amazon QuickSight = serverless ML-powered BI service for interactive dashboards.

⚠️ Exam trap: “Column-level security” services:

QuickSight Use Cases:

QuickSight Data Sources:

Source TypeExamples
AWS ServicesRDS, Aurora, Redshift, Athena, S3, OpenSearch, Timestream
On-PremisesDatabases via JDBC (Teradata)
SaaSSalesforce, Jira
File ImportsXLSX, CSV, JSON, TSV, ELF/CLF (log formats)
QuickSight Integrations:
┌─────────────────────────────────────────────────────────┐
│                   Amazon QuickSight                     │
└────────────────────────┬────────────────────────────────┘
                         │
    ┌────────────────────┼────────────────────────────────┐
    │                    │                                │
    ▼                    ▼                                ▼
AWS Services      On-Premises/SaaS                  File Imports
RDS, Aurora,      Teradata (JDBC),                  XLSX, CSV,
Redshift, Athena, Salesforce, Jira                  JSON, TSV,
S3, OpenSearch,                                     Log files
Timestream

QuickSight Users & Sharing:

⚠️ Exam trap: “BI dashboards from multiple AWS sources” → QuickSight. Integrates with Athena, Redshift, RDS, S3, etc.

⚠️ Exam trap: QuickSight users/groups ≠ IAM. They are QuickSight-specific identities.

Amazon OpenSearch Service (successor to ElasticSearch) = managed search and analytics engine.

OpenSearch Ingestion Patterns:

SourcePathLatency
Kinesis Data Streams→ Firehose → Lambda (transform) → OpenSearchNear real-time
Kinesis Data Streams→ Lambda → OpenSearchReal-time
CloudWatch Logs→ Subscription Filter → Lambda → OpenSearchReal-time
CloudWatch Logs→ Subscription Filter → Firehose → OpenSearchNear real-time
DynamoDB→ DynamoDB Streams → Lambda → OpenSearchReal-time

DynamoDB + OpenSearch Pattern:

CRUD ──► DynamoDB ──► DynamoDB Stream ──► Lambda ──► OpenSearch
              │                                          │
              │                                          │
              └─── API to retrieve items ◄── App ──► API to search items ───┘

⚠️ Exam trap: “Search any field” or “partial text match” or “full-text search” → OpenSearch. DynamoDB only queries by primary key or indexes.

⚠️ Exam trap: “Real-time” vs “Near real-time” ingestion:

DocumentDB is a document database (NoSQL) service that supports MongoDB workloads, proprietary fully managed and highly available across 3 AZ. Automatically grows and scales to workloads with millions of requests per second.

⚠️ Exam trap: DynamoDB vs DocumentDB — “MongoDB migration” doesn’t always mean DocumentDB!

RequirementAnswer
MongoDB compatibility + no code changesDocumentDB
Serverless + Global Tables + no server managementDynamoDB

Key decision point:

DocumentDB requires provisioned instances (not truly serverless), but preserves MongoDB compatibility.

Amazon Neptune is a fully managed graph database. Usually for graph data sets like social network, knowledge graphs (Wikipedia), recommendation engines and fraud detection. Highly available across 3 AZ, with up to 15 replicas.

⚠️ Exam trap: Graph queries = Neptune. Classic example:

Neptune Use Cases: Social networks, recommendation engines, fraud detection, knowledge graphs

Amazon Timestream fully managed, fast, scalable, serverless time series database. Built-in time series analytics functions (helps you identify patterns in your data in near real-time).

Timestream Use Cases: IoT sensors (temperature, humidity, pressure), application metrics, DevOps monitoring, industrial telemetry

⚠️ Exam trap: “Thousands of sensors” + “readings per second” + “fast analytics” = Timestream

Amazon Keyspaces (for Apache Cassandra) is a fully managed, serverless, Cassandra-compatible database. Highly available and scalable with no servers to manage.

⚠️ Exam trap: Cassandra migration → Keyspaces (not DynamoDB!)

Amazon QLDB (Quantum Ledger Database) is a fully managed, serverless, highly available book recording financial transactions. (Unlike Amazon Managed Blockchain there is no decentralization component).

Amazon Managed Blockchain is managed blockchain service to join public blockchain networks or create your own scalable private network, without the need for a trusted, central authority. Compatible with Hyperledger Fabric and Ethereum.

AWS Glue = fully serverless managed ETL (Extract, Transform, Load) service.

Glue Components:

ComponentPurpose
Glue Data CrawlerScans data sources, writes metadata to Data Catalog
Glue Data CatalogCentral metadata repository (databases, tables)
Glue ETL JobsTransform and load data
Glue Job BookmarksPrevent re-processing old data
Glue DataBrewClean/normalize data with pre-built transformations
Glue StudioGUI to create, run, monitor ETL jobs
Glue Streaming ETLReal-time ETL (Spark Streaming) for Kinesis, Kafka, MSK

Glue Data Catalog Architecture:

Data Sources                     Glue Data Catalog              Consumers
┌─────────────┐                 ┌─────────────────┐
│ Amazon S3   │                 │   Databases     │            ┌─────────────┐
│ Amazon RDS  │──► Glue ───────►│   Tables        │──────────►│ Athena      │
│ DynamoDB    │   Crawler       │   (Metadata)    │            │ Redshift    │
│ JDBC        │  (writes        └─────────────────┘            │ EMR         │
└─────────────┘   metadata)            ▲                       └─────────────┘
                                       │
                                Glue ETL Jobs

Glue ETL Pattern — Convert to Parquet:

S3 Put ──► Input S3 ──► Glue ETL ──► Output S3 ──► Athena
           (CSV)        (transform)   (Parquet)    (analyze)
              │
              ▼
        S3 Event ──► Lambda ──► Trigger Glue Job
                (or EventBridge)

Common Glue Use Cases:

⚠️ Exam trap: “Convert CSV to Parquet for Athena” → Glue ETL. Glue can be triggered by S3 events via Lambda or EventBridge.

⚠️ Exam trap: “Centralized metadata catalog” or “data discovery” → Glue Data Catalog. Used by Athena, Redshift Spectrum, EMR.

⚠️ Exam trap: “Streaming ETL” → Glue Streaming ETL (Spark Streaming). Compatible with Kinesis, Kafka, MSK.

⚠️ Exam trap: “Prevent re-processing old data” or “incremental ETL” → Glue Job Bookmarks. Tracks what’s already processed, only processes new data.

AWS Lake Formation = fully managed service to set up a data lake in days.

Lake Formation Features:

FeatureDescription
Source BlueprintsPre-built connectors for S3, RDS, Aurora, on-premises DBs
ETL and Data PrepTransform and prepare data
Data CatalogCentral metadata repository
Fine-grained Access ControlRow-level and Column-level security
Security SettingsCentralized permissions management

Lake Formation Architecture:

Data Sources                    Lake Formation              Consumers
┌─────────────┐              ┌────────────────────┐
│ Amazon S3   │              │ • Source Crawlers  │       ┌─────────────┐
│ RDS/Aurora  │──► ingest ──►│ • ETL & Data Prep  │──────►│ Athena      │
│ On-Premises │              │ • Data Catalog     │       │ Redshift    │
│ (SQL/NoSQL) │              │ • Access Control   │       │ EMR/Spark   │
└─────────────┘              │   (row/column)     │       └─────────────┘
                             └─────────┬──────────┘              │
                                       │                         ▼
                               ┌───────▼───────┐              Users
                               │   Data Lake   │
                               │ (stored in S3)│
                               └───────────────┘

Lake Formation vs Glue:

AspectGlueLake Formation
FocusETL + Data CatalogComplete data lake management
SecurityBasic IAMFine-grained (row/column-level)
ScopeETL jobsEnd-to-end data lake
Built on-AWS Glue

⚠️ Exam trap: “Data lake” + “fine-grained access control” or “row/column-level security” → Lake Formation. Not just Glue (Glue = ETL only, no fine-grained permissions).

⚠️ Exam trap: “Centralized permissions for data lake” → Lake Formation. Manages access across Athena, Redshift, EMR in one place.

Amazon MSK (Managed Streaming for Apache Kafka) = fully managed Apache Kafka on AWS.

MSK Architecture:

Producers               MSK Cluster                    Consumers
(Kinesis, IoT,    ┌─────────────────────────┐
 RDS, etc.)       │     ┌──────────┐        │    ┌──────────────────┐
       │          │     │ Broker 1 │◄──┐    │    │ Kinesis Data     │
       ▼          │     └──────────┘   │    │    │ Analytics (Flink)│
┌──────────┐      │          │    replication  ──►│ Glue Streaming   │
│ Your     │──────┼──►┌──────────┐   │    │    │ Lambda           │
│ Code     │      │   │ Broker 2 │◄──┤    │    │ EC2/ECS/EKS      │
└──────────┘      │   └──────────┘   │    │    └──────────────────┘
                  │          │       │    │
                  │     ┌──────────┐ │    │
                  │     │ Broker 3 │◄┘    │
                  │     └──────────┘      │
                  └─────────────────────────┘

Kinesis Data Streams vs Amazon MSK:

AspectKinesis Data StreamsAmazon MSK
Message size1 MB limit1 MB default, configurable to 10 MB
Data structureShardsKafka Topics with Partitions
ScalingShard splitting & mergingCan only add partitions
In-flight encryptionTLS onlyPLAINTEXT or TLS
At-rest encryptionKMSKMS
Retention1-365 daysUnlimited (EBS)

MSK Consumers:

Amazon Managed Service for Apache Flink (previously: Kinesis Data Analytics for Apache Flink)

Flink Sources:
Kinesis Data Streams ──┐
                       ├──► Amazon Managed Service ──► (destinations)
Amazon MSK ────────────┘    for Apache Flink

⚠️ Exam trap: “Kafka on AWS” or “migrate Kafka” → Amazon MSK. Kinesis is AWS-native, MSK is Kafka-compatible.

⚠️ Exam trap: “Message > 1 MB” streaming → MSK (configurable up to 10 MB). Kinesis = hard 1 MB limit.

⚠️ Exam trap: “Apache Flink” or “real-time stream analytics” → Amazon Managed Service for Apache Flink. Note: Flink does NOT read from Firehose!

⚠️ Exam trap: Kinesis vs MSK decision:


Big Data Ingestion Pipeline (Serverless)

Requirements: Real-time collection → Transform → SQL query → Reports in S3 → Warehouse + Dashboards

IoT Devices
    │
    ▼ (real-time)
┌─────────────────┐     Every 1 min    ┌─────────────┐
│ Kinesis Data    │───────────────────►│ Ingestion   │
│ Streams         │                    │ Bucket (S3) │
└─────────────────┘                    └──────┬──────┘
         │                                    │
    ┌────┴────┐                          (optional)
    ▼         │                               │
┌─────────┐   │                          ┌────▼────┐
│ Kinesis │   │                          │   SQS   │
│ Firehose│◄──┘                          └────┬────┘
└────┬────┘                                   │
     │                                   ┌────▼────┐    Pull data
     │ Lambda                            │ Lambda  │◄──────────┐
     │ (transform)                       └────┬────┘           │
     ▼                                        │                │
                                         ┌────▼────┐     ┌─────┴─────┐
                                         │ Athena  │────►│ Reporting │
                                         │ (SQL)   │     │ Bucket    │
                                         └─────────┘     └─────┬─────┘
                                                               │
                                              ┌────────────────┼────────────────┐
                                              ▼                ▼                ▼
                                        QuickSight      Redshift          (other BI)
                                        (dashboards)    Serverless

Pipeline Components:

StageServiceWhy
Ingest real-timeKinesis Data StreamsReal-time data collection
Buffer + DeliverKinesis FirehoseNear real-time delivery to S3 (1 min)
TransformLambda + FirehoseData transformations during delivery
StoreS3 (Ingestion Bucket)Durable storage, triggers events
DecoupleSQS (optional)Buffer between S3 and processing
QueryAthenaServerless SQL on S3
OutputS3 (Reporting Bucket)Query results storage
VisualizeQuickSight / RedshiftDashboards and analytics

Key Points:

⚠️ Exam trap: “Serverless” + “real-time ingestion” + “SQL query” + “dashboards” → This full pipeline. Know each component’s role!


Database Selection Guide:

NeedUse
SQL, ACID, complex queriesRDS / Aurora
Key-value, massive scale, single-digit msDynamoDB
Key-value, large objects (100MB+ files)S3
Caching, sessions, leaderboardsElastiCache (Redis/Memcached)
Data warehouse, analytics (PB scale)Redshift
Graph relationships (social, fraud)Neptune
Time series (IoT, metrics)Timestream
Document store (MongoDB compatible)DocumentDB
Immutable ledger (financial)QLDB
ETL / Data catalogGlue

⚠️ Exam traps:

⚠️ Exam trap - “In-memory + caching SQL queries + HIPAA”:



🎯 MASTER SUMMARY: Database Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: RDBMS vs NoSQL = Structure vs Flexibility

RDBMS (RDS/Aurora): Structured data, complex joins, ACID transactions, fixed schema NoSQL (DynamoDB): Flexible schema, massive scale, key-value access, millisecond latency

Rule: Need JOINs or transactions? → RDS/Aurora. Need scale + flexibility? → DynamoDB.

Principle 2: Read Replica = ASYNC, Multi-AZ = SYNC

This is THE most tested concept:

Key insight: Multi-AZ standby is for failover ONLY. It cannot be read from.

Principle 3: Aurora = RDS with Superpowers

Aurora is AWS’s cloud-optimized relational DB. Same concept as RDS, but:

If question mentions RDS + wants better performance/features → think Aurora.

Principle 4: Caching = Code Changes Required (Except DAX)

CacheCode Changes?Works With
ElastiCache✅ RequiredAny application
DAX❌ Not requiredDynamoDB only

DAX uses the same DynamoDB API. ElastiCache requires application modifications.

Principle 5: Encryption = Launch Time Decision

You cannot encrypt an existing unencrypted database directly. Solution: Snapshot → Restore as encrypted → Switch applications

Same applies: Master not encrypted → Replicas CANNOT be encrypted.

Principle 6: Cross-Region = Different Services, Different Behaviors

ServiceCross-Region FeatureBehavior
RDSRead Replica (cross-region)Manual promotion, costs $$$
AuroraGlobal Database<1s replication, <1min failover
DynamoDBGlobal TablesActive-active (writes anywhere!)

Key insight: Only DynamoDB Global Tables allows writes in multiple regions.

Principle 7: Restore = NEW Database

Restoring from backup/snapshot ALWAYS creates a new database instance:

Exception: Aurora Backtrack → in-place rewind (no new DB created).

Principle 8: Right Tool for the Data Type

Match the data type to the purpose-built database:


Part 2: Decision Tree (Follow Keywords → Find Answer)

Step 1: What type of data/workload?

                        What's the requirement?
                              │
    ┌─────────────┬───────────┼───────────┬─────────────┬────────────┐
    ▼             ▼           ▼           ▼             ▼            ▼
 SQL + Joins   Key-Value   Caching    Analytics    Specialized   Big Objects
    │             │           │           │             │            │
    ▼             ▼           ▼           ▼             ▼            ▼
 RDS/Aurora   DynamoDB   ElastiCache  Redshift     See Step 3      S3
                           /DAX       /Athena

Step 2: Which RDS/Aurora variant?

                    Need relational database?
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
   Standard needs      Cloud-optimized        Full OS access?
        │                     │                     │
        ▼                     ▼                     ▼
      RDS               Aurora                RDS Custom
        │                     │             (Oracle/SQL only)
        │                     │
        ▼                     ▼
   Cross-region?        Unpredictable
        │               workload?
        ▼                     │
   Read Replica              ▼
   (manual failover)   Aurora Serverless

Step 3: Specialized Database Selection

If the data is…Use…
Graph relationships (social, fraud)Neptune
Time series (IoT, metrics, logs)Timestream
Immutable ledger (financial, compliance)QLDB
MongoDB-compatible JSON documentsDocumentDB
Cassandra-compatible wide-columnKeyspaces
Free-text searchOpenSearch
Blockchain (decentralized)Managed Blockchain

Feature-Based Decision Table

If question mentions…Answer is…
“Users don’t see updated data” + Read ReplicaExpected behavior (ASYNC lag)
“Analytics slowing production”Offload to Read Replica
“Cross-region disaster recovery” + AuroraAurora Global Database
“Dev/test” + “unused most of time”Aurora Serverless
“Production data ASAP” + “read/write tests”Aurora Cloning
“Full OS customization” + Oracle/SQL ServerRDS Custom
“Lambda” + “DB connections” + “failover”RDS Proxy
“Users keep logging out” + Auto ScalingElastiCache (sessions)
“Real-time leaderboard” + “ranked”Redis Sorted Sets
“DynamoDB” + “microsecond reads”DAX
“Multi-region active-active writes”DynamoDB Global Tables
“Social network” + “relationships”Neptune
“IoT” + “time-series”Timestream
“Immutable” + “financial audit”QLDB

Kinesis Family Decision Tree

What do you need to do with streaming data?
                    │
    ┌───────────────┼───────────────┬───────────────────┐
    ▼               ▼               ▼                   ▼
 INGEST         DELIVER          ANALYZE            KAFKA
 (collect)      (to S3/etc)      (real-time)        (compatible)
    │               │               │                   │
    ▼               ▼               ▼                   ▼
 Kinesis        Kinesis         Kinesis Data        Amazon
 Data Streams   Firehose        Analytics/Flink     MSK
ServicePurposeKey Feature
Kinesis Data StreamsIngest real-time dataCustom consumers, 1-365 day retention
Kinesis FirehoseDeliver to destinationsNear real-time (1 min buffer), auto-scaling
Kinesis Data AnalyticsReal-time analyticsApache Flink, SQL on streams
Amazon MSKManaged KafkaKafka-compatible, 10 MB messages, unlimited retention

The “CANNOT” List

Cannot…Instead…
Read from Multi-AZ standbyUse Read Replica for read scaling
Write to Read ReplicaPromote it first (breaks replication)
Encrypt existing DB directlySnapshot → Restore as encrypted
Use IAM Auth with Oracle/SQL ServerOnly MySQL, PostgreSQL, MariaDB
Use Backtrack on RDSAurora-only feature
Use DAX with non-DynamoDBUse ElastiCache instead
Use ElastiCache without code changesUse DAX for DynamoDB (same API)
Cross-region failover with Multi-AZMulti-AZ = same region only
Use “Redshift Global cluster”Doesn’t exist! Use cross-region snapshot copy
Read from Firehose with FlinkFlink reads from Streams or MSK only

Part 3: Scenario Pattern Recognition

Pattern: “OLTP with auto-scaling storage”

Keywords: OLTP, auto-scaling, maximum replicas, transactional

Answer: Aurora

Why: OLTP = relational (not NoSQL). Aurora has auto-scaling storage (10GB→128TB) + 15 replicas. RDS storage requires manual provisioning.


Pattern: “Analytics queries slowing down production”

Keywords: reporting, analytics, BI tools, production performance

Answer: Create Read Replica for analytics workload

Why: Read Replicas are ASYNC, so heavy queries won’t affect the master.


Pattern: “Users don’t see updated data immediately”

Keywords: stale data, eventually consistent, lag, Read Replica

Answer: This is expected behavior (ASYNC replication)

Why: Read Replica uses ASYNC replication. If strong consistency needed → read from master.


Pattern: “Cross-region disaster recovery with fast failover”

Keywords: cross-region, DR, RTO <1 minute, Aurora

Answer: Aurora Global Database

Why: <1 second replication, <1 minute RTO. RDS cross-region Read Replica = manual promotion.


Pattern: “Dev/test environment, unused most of the time”

Keywords: development, testing, intermittent, unpredictable, minimize costs

Answer: Aurora Serverless

Why: Scales to zero, pay per second. Provisioned = pay even when idle.


Pattern: “Need production data immediately for testing”

Keywords: clone production, read/write tests, staging environment, fast copy

Answer: Aurora Cloning (instant copy-on-write)

Why: Snapshot/restore = slow (copies all data). Read Replica = read-only. Cloning = instant + writable.


Pattern: “Full OS access for Oracle/SQL Server”

Keywords: customize OS, install patches, SSH access, Oracle/SQL Server

Answer: RDS Custom

Why: Standard RDS = no SSH. EC2 = no AWS management. RDS Custom = both.


Pattern: “Key-value store for large files”

Keywords: key-value, large files, 100MB, store files, durable storage

Answer: S3 (NOT DynamoDB!)

Why: S3 IS a key-value store (key = path, value = object). DynamoDB has 400KB item limit. For files 100MB+ → S3.


Pattern: “Lambda functions + database connections + slow failover”

Keywords: Lambda, connection pooling, many connections, failover time

Answer: RDS Proxy

Why: Connection pooling reduces DB load. 66% faster failover. Works great with Lambda.


Pattern: “Users keep getting logged out across instances”

Keywords: sessions, logged out, ALB, Auto Scaling, stateless

Answer: ElastiCache (session store) or DynamoDB with TTL

Why: Sessions stored in shared cache → any instance can retrieve. NOT sticky sessions (uneven load).


Pattern: “Real-time gaming leaderboard with rankings”

Keywords: leaderboard, ranking, sorted scores, real-time

Answer: Redis Sorted Sets

Why: Redis Sorted Sets guarantee uniqueness + ordering. Memcached has no sorted sets.


Pattern: “DynamoDB with microsecond read latency”

Keywords: DynamoDB, faster reads, microsecond, cache

Answer: DAX (DynamoDB Accelerator)

Why: 10x faster reads, no code changes (same API). ElastiCache = different API.


Pattern: “Multi-region active-active with writes anywhere”

Keywords: active-active, write to any region, global users

Answer: DynamoDB Global Tables

Why: Aurora Global = read-only replicas. Only DynamoDB Global Tables = writes in any region.


Pattern: “MongoDB migration + no code changes”

Keywords: MongoDB, migrate, no code changes, same drivers, existing application

Answer: DocumentDB

Why: DocumentDB is MongoDB-compatible (same API/drivers). Application code works unchanged. Note: NOT RDS — there’s no “RDS for MongoDB”!


Pattern: “MongoDB migration + serverless + global”

Keywords: MongoDB, NoSQL, serverless, global, no server management

Answer: DynamoDB (NOT DocumentDB!)

Why: DocumentDB requires provisioned instances (not serverless). DynamoDB = truly serverless + Global Tables. “MongoDB” in question is a distractor — focus on requirements.


Pattern: “Social network with friend relationships”

Keywords: friends of friends, social graph, relationships, connections, likes, multi-hop queries

Answer: Neptune (Graph database)

Why: Graph databases are optimized for relationship traversals. Example: “likes on posts by friends of Mike” = multi-hop graph query. RDS would need complex JOINs; DynamoDB can’t do JOINs at all.


Pattern: “Cassandra migration to AWS”

Keywords: Cassandra, migrate, CQL, wide-column, no code changes

Answer: Amazon Keyspaces

Why: Keyspaces is Cassandra-compatible (CQL). Existing Cassandra code works unchanged. Fully managed, serverless, highly available.


Pattern: “IoT sensors with readings over time”

Keywords: IoT, sensors, time-series, metrics, trends, readings per second, temperature, humidity, pressure, fast analytics, predict

Answer: Timestream

Why: Purpose-built for time-series data. 1000x faster + 1/10th cost vs relational. Built-in analytics functions for pattern detection.


Pattern: “Financial transactions with immutable audit trail”

Keywords: immutable, ledger, financial, compliance, audit, cannot modify

Answer: QLDB

Why: Cryptographically verifiable history. Note: QLDB ≠ blockchain (centralized, no decentralization).


Pattern: “Encrypt existing unencrypted database”

Keywords: encrypt, existing database, unencrypted, enable encryption

Answer: Snapshot → Restore as encrypted

Why: Cannot enable encryption on existing DB. Must create new encrypted DB from snapshot.


Pattern: “Full-text search on DynamoDB data”

Keywords: search any field, partial match, full-text search, DynamoDB + search

Answer: DynamoDB + OpenSearch

Why: DynamoDB only queries by primary key/indexes. OpenSearch enables full-text search. Use DynamoDB Streams → Lambda → OpenSearch to sync data.


Keywords: logs, search, CloudWatch, real-time, analytics, dashboards

Answer: CloudWatch Logs → OpenSearch (via Lambda or Firehose)

Why: OpenSearch provides search + OpenSearch Dashboards for visualization. Lambda = real-time, Firehose = near real-time.


Pattern: “Serverless SQL on logs in S3”

Keywords: logs in S3, serverless, quick analysis, SQL, ad-hoc

Answer: Amazon Athena

Why: Athena = serverless SQL directly on S3. No infrastructure to manage. Pay $5/TB scanned.


Pattern: “Columnar analytics + BI dashboards”

Keywords: data warehouse, OLAP, columnar, analytics, QuickSight, Tableau, dashboards

Answer: Amazon Redshift + QuickSight

Why: Redshift = OLAP data warehouse (columnar storage). QuickSight = native BI integration for dashboards.


Pattern: “Convert file format for Athena”

Keywords: convert JSON/CSV to Parquet, optimize Athena, reduce costs

Answer: AWS Glue ETL

Why: Glue transforms data formats. Parquet = columnar = Athena scans less = cheaper.


Pattern: “Real-time analytics on streaming data”

Keywords: real-time analytics, stream processing, Kinesis, SQL on streams

Answer: Kinesis Data Analytics (Amazon Managed Service for Apache Flink)

Why: Flink processes streams in real-time. Reads from Kinesis Data Streams or MSK. NOT Firehose!


Pattern: “Migrate Kafka with no code changes”

Keywords: Apache Kafka, migrate, Kafka-compatible, existing application

Answer: Amazon MSK

Why: MSK = managed Kafka. Same APIs, no code changes. Kinesis requires code changes.


Pattern: “Redshift cross-region disaster recovery”

Keywords: Redshift, cross-region, DR, disaster recovery

Answer: Cross-region snapshot copy

Why: Enable automated snapshots + configure cross-region copy. Restore in DR region. “Redshift Global” doesn’t exist!


Part 4: Quick Reference Tables

RDS vs Aurora Comparison

FeatureRDSAurora
EnginesPostgreSQL, MySQL, MariaDB, Oracle, SQL Server, DB2PostgreSQL, MySQL
PerformanceStandard5x MySQL, 3x PostgreSQL
StorageEBS-backed, auto-scaling6 copies across 3 AZ, 128TB max
Read ReplicasUp to 15Up to 15, <10ms lag
FailoverSlower<30 seconds
Backtrack❌ No✅ Yes (in-place rewind)
Serverless❌ No✅ Yes
Global DatabaseRead Replica only<1s replication, <1min RTO
CloningSnapshot onlyInstant copy-on-write

Replication & HA Quick Reference

FeatureRead ReplicaMulti-AZAurora Global
PurposeRead scalingHA/FailoverCross-region DR
ReplicationASYNCSYNCASYNC (<1 sec)
Serve reads?✅ Yes❌ No✅ Yes
Auto failover?❌ Manual✅ Auto✅ Auto (<1 min)
Cross-region?✅ Yes❌ No✅ Yes

Caching Options

FeatureElastiCache RedisElastiCache MemcachedDAX
Works withAny appAny appDynamoDB only
Code changes✅ Required✅ Required❌ Not required
HAMulti-AZ + failover❌ NoMulti-AZ
Persistence✅ AOF❌ NoN/A
Sorted Sets✅ Yes❌ NoN/A

NoSQL Database Selection

ServiceData ModelCompatible WithUse Case
DynamoDBKey-value-Serverless, sessions
DocumentDBDocumentMongoDBMongoDB workloads
KeyspacesWide-columnCassandraCassandra workloads
NeptuneGraphGremlin, SPARQLSocial, fraud
TimestreamTime seriesSQLIoT, metrics
QLDBLedgerSQLImmutable audit

Key Numbers to Remember

ItemValue
Read Replicas max15
Aurora storage max128 TB
Aurora copies6 across 3 AZ
Aurora failover<30 seconds
Aurora Global replication<1 second
Aurora Global RTO<1 minute
DynamoDB item size limit400 KB
S3 object size max5 TB
Automated backup retention1-35 days
Manual snapshot retentionUnlimited
RDS Proxy failover improvement66% faster

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
SQL + Joins + TransactionsRDS / Aurora
“OLTP” + “auto-scaling storage”Aurora
“5x MySQL performance”Aurora
“cross-region” + “RTO <1 min”Aurora Global
“intermittent workload”Aurora Serverless
“in-place rewind”Aurora Backtrack
“clone production instantly”Aurora Cloning
“OS access” + Oracle/SQL ServerRDS Custom
“analytics slowing production”Read Replica
“can’t read from standby”Multi-AZ (expected)
“Lambda + connections”RDS Proxy
“66% faster failover”RDS Proxy
“key-value” + “large files” (MB+)S3 (not DynamoDB!)
“serverless NoSQL”DynamoDB
“serverless” + “global” + NoSQLDynamoDB (not DocumentDB)
“microsecond DynamoDB reads”DAX
“active-active writes”DynamoDB Global Tables
“sessions across instances”ElastiCache / DynamoDB TTL
“leaderboard + rankings”Redis Sorted Sets
“HA + persistence cache”Redis
“MongoDB” + “no code changes”DocumentDB
“MongoDB compatible” (same drivers)DocumentDB
“MongoDB” + “serverless” + “global”DynamoDB (trap!)
“RDS for MongoDB”Doesn’t exist! (trap)
“graph + relationships”Neptune
“social network analysis”Neptune
“friends of friends” queriesNeptune
“likes on posts by friends”Neptune
“fraud detection patterns”Neptune
“IoT + time-series”Timestream
“sensors” + “readings per second”Timestream
“temperature/humidity/pressure”Timestream
“immutable financial ledger”QLDB
“Cassandra compatible”Keyspaces
“free-text search”OpenSearch
“partial match” + “any field”OpenSearch
“search DynamoDB data”DynamoDB + OpenSearch
“logs to dashboards”CloudWatch → OpenSearch
“ETL + data catalog”Glue
“convert CSV to Parquet”Glue ETL
“centralized metadata”Glue Data Catalog
“streaming ETL”Glue Streaming ETL
“prevent re-processing”Glue Job Bookmarks
“serverless SQL on S3”Athena
“PB-scale analytics”Redshift
“BI dashboards”QuickSight
“visualizations from Athena/Redshift”QuickSight
“embeddable analytics”QuickSight
“data lake”Lake Formation
“row/column-level security”Lake Formation (data lake) or QuickSight Enterprise (dashboards)
“centralized data lake permissions”Lake Formation
“column-level security” + “dashboards”QuickSight Enterprise
“Kafka on AWS”Amazon MSK
“migrate Kafka”Amazon MSK
“message > 1 MB streaming”MSK (up to 10 MB)
“unlimited stream retention”MSK
“Apache Flink”Managed Service for Apache Flink
“real-time stream analytics”Managed Service for Apache Flink
“Flink + Kinesis or MSK”Managed Service for Apache Flink
“logs in S3” + “quick analysis”Athena
“OLAP” + “columnar” + “warehouse”Redshift
“Redshift Global cluster”Doesn’t exist! (trap)
“Redshift cross-region DR”Cross-region snapshot copy
“COPY/UNLOAD through VPC”Enhanced VPC Routing
“Spark/Hive/Presto” + “big data”EMR
“open source big data frameworks”EMR
“deliver to S3/Redshift” + “near real-time”Kinesis Firehose
“ingest real-time data”Kinesis Data Streams

Part 6: Elimination Checklist

When stuck between options, eliminate systematically:

□ Is it about READING from standby?
  → Multi-AZ standby can't serve reads
  → Use Read Replica for read scaling

□ Is it CROSS-REGION?
  → Multi-AZ = same region only (eliminate it)
  → Aurora Global or DynamoDB Global Tables

□ Does it need WRITES in multiple regions?
  → Aurora Global = read-only replicas (eliminate it)
  → DynamoDB Global Tables = active-active writes

□ Is it about CACHING without code changes?
  → ElastiCache requires code changes (eliminate it)
  → DAX works with DynamoDB API (no changes)

□ Does it mention ORACLE or SQL SERVER customization?
  → Standard RDS = no SSH (eliminate it)
  → RDS Custom allows full access

□ Is it asking for INSTANT clone?
  → RDS Snapshot = slow (eliminate it)
  → Aurora Cloning = instant

□ Is it GRAPH data?
  → DynamoDB/RDS = complex (eliminate them)
  → Neptune is purpose-built

□ Is it TIME SERIES?
  → DynamoDB/RDS = not optimized (eliminate them)
  → Timestream is purpose-built

□ Is it IMMUTABLE ledger?
  → DynamoDB = mutable (eliminate it)
  → QLDB is immutable

🏆 The Golden Rules

  1. ASYNC = Eventually Consistent (Read Replica lag is expected behavior)
  2. SYNC = Always Consistent (Multi-AZ, but can’t read from standby)
  3. Multi-AZ = HA, Read Replica = Scaling (different purposes!)
  4. Aurora = RDS++ (same engines, better everything)
  5. DAX = no code changes, ElastiCache = code changes required
  6. Redis = HA + features, Memcached = simple + sharding
  7. Aurora Global = read replicas, DynamoDB Global = write anywhere
  8. Backtrack, Cloning, Serverless = Aurora-only features
  9. RDS Custom = Oracle/SQL Server customization only
  10. RDS Proxy = Lambda + 66% faster failover
  11. Restore = NEW database (never overwrites existing)
  12. Encryption at launch (later = snapshot → restore encrypted)
  13. 35 days automated backup, unlimited manual snapshots
  14. QLDB ≠ Blockchain (QLDB is centralized)
  15. Athena = serverless SQL on S3 ($5/TB, use Parquet for cost savings)
  16. Redshift = OLAP, Athena = ad-hoc (Redshift faster for complex joins/BI)
  17. OpenSearch = full-text search (complement DynamoDB for search capabilities)
  18. EMR = Hadoop/Spark big data (Spot for Task Nodes, Reserved for Master/Core)
  19. QuickSight = BI dashboards (SPICE for in-memory, users ≠ IAM)
  20. Glue = serverless ETL + Data Catalog (CSV→Parquet, metadata for Athena/Redshift)
  21. Lake Formation = data lake + fine-grained security (row/column-level, built on Glue)
  22. MSK = managed Kafka (alternative to Kinesis for Kafka workloads, larger messages, unlimited retention)
  23. Flink reads from Kinesis or MSK (NOT Firehose — real-time stream processing

Amazon Elastic Container Service (ECS):

Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service that helps you easily deploy, manage, and scale containerized applications. Amazon ESC intergrated with Application Load Balancer (ALB).

Types of provisioning:

Amazon Elastic Container Registry (ECR) is a public or private registry to store container images, so they can be run by ECS.

ECS task execution role is capabilities of ECS agent (and container instance), e.g:

Amazon Lambda and Batch:

Amazon Lambda is serverless, autoscaled, event-driven service to run on-demand virual functions. Supports many programming languages.

Amazon API Gateway fully managed, serverless and scalable service for developers to easily create, publish, maintain and monitor APIs. Support RESTful APIs and WebSocket APIs.

AWS Batch fully managed batch processing at any scale. Batch will dynaicatlly launch EC2 instances or Spot Instances. Batch jobs are defined as Docker images and run on ECS. AWS Batch has no time limits unlike AWS Lambda, not limited by runtimes as long as it packaged in Docker container and relies on EBS or instance store for disk space.

Serverless Overview:

Serverless = paradigm where developers don’t manage servers — just deploy code/functions.

AWS Serverless Services:

ServiceType
AWS LambdaCompute (FaaS)
DynamoDBDatabase (NoSQL)
Aurora ServerlessDatabase (SQL)
API GatewayAPI management
S3Object storage
SNS & SQSMessaging
Kinesis Data FirehoseStreaming
Step FunctionsWorkflow orchestration
FargateServerless containers
CognitoAuthentication
CloudFrontCDN

AWS Lambda:

Lambda vs EC2:

AspectLambdaEC2
ManagementVirtual functions — no serversVirtual servers to manage
DurationLimited by time (15 min max)Continuously running
ExecutionOn-demand, event-drivenAlways on
ScalingAutomaticManual intervention
RAM/CPULimited (up to 10GB RAM)Choose instance type

Lambda Benefits:

⚠️ Exam trap: “Which service has NO built-in caching?” → Lambda. Lambda is stateless by design. API Gateway has response caching, DynamoDB has DAX. Lambda needs external cache (ElastiCache, DAX).

Lambda Language Support:

RuntimeLanguages
NativeNode.js, Python, Java, C#/.NET, PowerShell, Ruby
Custom Runtime APIRust, Golang (community-supported)
Container ImageAny language (must implement Lambda Runtime API)

⚠️ Exam trap: Lambda Container Image ≠ arbitrary Docker. Must implement Lambda Runtime API. For arbitrary Docker → ECS/Fargate.

Lambda Integrations (Main ones):

Lambda Use Cases (from screenshots):

  1. Serverless CRON Job: EventBridge (every 1 hour) → Lambda
  2. Serverless Thumbnail Creation: S3 (new image) → Lambda → creates thumbnail → S3 + DynamoDB metadata

Lambda SnapStart:

SnapStart Enabled:          SnapStart Disabled:
invoke                      invoke
  ↓                          ↓
Lambda (pre-initialized)    Lambda
  ↓                          ↓ Init
Invoke                      Invoke
  ↓                          ↓
Shutdown                    Shutdown

⚠️ Exam trap: “Reduce Lambda cold start” → SnapStart (or Provisioned Concurrency).

Lambda Concurrency:

TypePurposeCold StartCost
UnreservedDefault poolYesPay per use
ReservedGuarantee capacity for functionYesPay per use
ProvisionedPre-warm instancesNoPay for provisioned + invocations

Throttling Behavior:

Lambda Concurrency Issue Example:

Many users → ALB → Lambda (1000 executions) ✓
Few users → API Gateway → Lambda → THROTTLE! ❌
SDK/CLI → Lambda → THROTTLE! ❌

Without reserved concurrency, one source can consume all capacity.

⚠️ Exam trap: “Lambda throttling from one service” → Use Reserved Concurrency to limit/isolate capacity per function.

⚠️ Exam trap: The 1000 concurrent limit is shared across ALL functions in the account/region. One busy function can starve others → use Reserved Concurrency to isolate.


Amazon API Gateway:

Amazon API Gateway = fully managed, serverless service to create, publish, maintain, and monitor APIs.


Lambda Layers & Destinations:

Lambda Layers:

⚠️ Exam trap: “Share code/libraries between Lambda functions” → Lambda Layers. Not copying code into each function.

Lambda Destinations:

⚠️ Exam trap: “Route Lambda async result on success” → Destinations. DLQ only handles failures.


AWS Batch:

AWS Batch = fully managed batch processing at any scale.

AWS Batch Architecture:

Job Queue ──► Compute Environment ──► Job Execution
                     │
        ┌────────────┼────────────┐
        ▼            ▼            ▼
    On-Demand     Spot         Fargate
      EC2         EC2       (serverless)

Batch Components:

ComponentDescription
Job DefinitionHow to run: Docker image, vCPU, memory, IAM role
Job QueueWhere jobs wait; priority-based
Compute EnvironmentManaged EC2/Spot/Fargate instances

AWS Batch Use Cases:

Lambda vs Batch:

AspectLambdaAWS Batch
Time limit15 minutesNo limit
RAM10 GB maxUp to 100s GB
Disk10 GB /tmpEBS volumes (TBs)
RuntimeLimited languagesAny Docker image
InvocationEvent-driven, sync/asyncJob queue, scheduled
ScalingInstant (1000 concurrent)Launches instances (minutes)
PricingPer request + durationPer EC2/Spot/Fargate time
Use caseShort, event-drivenLong-running batch jobs

When to Choose Batch over Lambda:

ScenarioWhy Batch
Job > 15 minLambda hard limit
Needs > 10 GB RAMLambda hard limit
Needs > 10 GB diskLambda hard limit
GPU requiredLambda has no GPU
Large file processingEBS storage available
Cost optimizationSpot instances (up to 90% savings)
Complex dependenciesFull Docker flexibility

⚠️ Exam trap: “Batch job > 15 minutes” or “needs Docker flexibility” or “> 10 GB memory/disk” → AWS Batch. “Event-driven, quick tasks” → Lambda.

⚠️ Exam trap: “Cost-optimize long-running batch jobs” → AWS Batch with Spot Instances. Up to 90% savings vs On-Demand. Lambda has no Spot option.

⚠️ Exam trap: “Serverless batch processing” → AWS Batch on Fargate (no EC2 to manage). Still not Lambda if > 15 min.


Lambda Limits & Pricing:

Lambda Pricing:

ComponentFree TierAfter Free Tier
Requests1M requests/month$0.20 per 1M requests
Duration400K GB-seconds/month$1.00 per 600K GB-seconds

Invocation vs Duration:

MetricWhat It IsDepends OnCost Impact
Invocation1 call = 1 requestNumber of triggers$0.20 per 1M
DurationTime function runsCode complexity, I/OGB-seconds
Cost = (Invocations × $0.20/1M) + (GB-seconds × $1.00/600K)
        └── count only ──┘         └── complexity matters ──┘

Example: Simple vs complex function

Duration examples:

Lambda Limits (per region):

LimitValue
Memory128 MB – 10 GB (1 MB increments)
Max execution time900 seconds (15 minutes)
Environment variables4 KB
/tmp disk512 MB – 10 GB
Concurrency1000 (can increase via support ticket)
Deployment (zip)50 MB compressed
Deployment (uncompressed)250 MB (code + dependencies)

⚠️ Exam trap: Lambda limit questions — know the key numbers: 15 min timeout, 10 GB RAM, 1000 concurrency, 250 MB uncompressed.

⚠️ Exam trap — Lambda Disqualifiers: If question mentions ANY of these → Lambda is WRONG answer:

Lambda vs Alternatives Decision:

ScenarioBest ChoiceWhy
Event-driven, < 15 minLambdaInstant scaling, pay per use
Batch job > 15 minAWS BatchNo time limit, Docker
Long-running + cost optimizeAWS Batch + Spot90% savings
Containers, always runningECS/FargateLong-running services
GPU, HPC, ML trainingAWS BatchGPU instances available

⚠️ Exam trap: “Long job + retry + can pause/resume days later” → SQS + AWS Batch or SQS + EC2. SQS retains messages up to 14 days. SNS has no retention (push and forget). Lambda has 15 min limit.

⚠️ Exam trap: Default Lambda timeout = 3 seconds. “Timeout error after 3 seconds” = default wasn’t changed, but if job needs > 15 min, Lambda is wrong choice entirely.

Cold Starts & Provisioned Concurrency:

SolutionCold Start?Cost
DefaultYesPay per use
SnapStartNo (for Java/Python/.NET)No extra cost
Provisioned ConcurrencyNoPay for provisioned capacity

⚠️ Exam trap: “Eliminate cold starts” → Provisioned Concurrency or SnapStart. SnapStart is free but limited to Java/Python/.NET.


Lambda in VPC:

Default Lambda Deployment:

Default Lambda Deployment:
                          ┌─────────────────────────────────┐
                          │          AWS Cloud              │
   Internet ◄────────────►│  Lambda ──────► DynamoDB   ✓    │
   (Public)               │    │                            │
                          │    │   ┌──────────────────────┐ │
                          │    └──►│ VPC & Private Subnet │ │
                          │        │  Private RDS    ✗    │ │
                          │        └──────────────────────┘ │
                          └─────────────────────────────────┘

Lambda in VPC Configuration:

⚠️ Exam trap: “Lambda access RDS in private subnet” → Must configure Lambda in VPC. Default Lambda cannot reach private resources.

⚠️ Exam trap: “Lambda can read DynamoDB but can’t write to SQS” → IAM Role missing permissions (needs sqs:SendMessage). Not security groups — SQS is accessed via API, not network. SQS doesn’t have security groups.


Edge Functions (CloudFront):

Customization at the Edge:

Two Types:

  1. CloudFront Functions — lightweight, JavaScript only
  2. Lambda@Edge — more powerful, Node.js/Python

Use Cases:

CloudFront Functions vs Lambda@Edge:

AspectCloudFront FunctionsLambda@Edge
RuntimeJavaScript onlyNode.js, Python
ScaleMillions req/secThousands req/sec
TriggersViewer Request/Response onlyViewer + Origin Request/Response
Max Execution< 1 ms5–10 seconds
Max Memory2 MB128 MB – 10 GB
Package Size10 KB1 MB – 50 MB
Network AccessNoYes
File System AccessNoYes
Request Body AccessNoYes
PricingFree tier, 1/6 price of @EdgeNo free tier, per request + duration
Managed InCloudFront consoleLambda (us-east-1 only)
CloudFront Request/Response Flow:

User ──► Viewer Request ──► Origin Request ──► Origin
                │                   │
                │                   │
         CloudFront Func      Lambda@Edge
         or Lambda@Edge       only

Origin ──► Origin Response ──► Viewer Response ──► User
                │                    │
          Lambda@Edge          CloudFront Func
          only                 or Lambda@Edge

When to Use Which:

Use CaseBest Choice
Cache key normalizationCloudFront Functions
Header manipulationCloudFront Functions
URL rewrites/redirectsCloudFront Functions
JWT validation (simple)CloudFront Functions
Needs AWS SDKLambda@Edge
Access request bodyLambda@Edge
External API callsLambda@Edge
Complex processingLambda@Edge

⚠️ Exam trap: “Millions of requests, simple manipulation” → CloudFront Functions. “Need network/file access or origin manipulation” → Lambda@Edge.

⚠️ Exam trap: “Authenticate at CloudFront Edge” or “auth before reaching origin” → Lambda@Edge (or CloudFront Functions for simple JWT). Not API Gateway — it lives in one region, not at edge.

⚠️ Exam trap: Lambda@Edge must be authored in us-east-1 — CloudFront replicates globally.


📌 RDS & Aurora Lambda Integration — see Database section above for details (RDS Event Notifications vs Invoke Lambda from Aurora).


Amazon DynamoDB:

Overview:

⚠️ Exam trap: “Provision EC2 for DynamoDB” = False. DynamoDB is serverless — no servers/instances. Unlike RDS where you choose instance type.

DynamoDB Basics:

ConceptDetails
StructureTables → Items (rows) → Attributes (columns)
Primary KeyMust be decided at creation time
ItemsInfinite number per table, max 400 KB per item
SchemaFlexible — attributes can be added over time

DynamoDB Indexes:

Index TypeWhen CreatedKeySeparate Throughput
LSI (Local Secondary Index)Table creation onlySame partition key, different sort keyNo (uses table’s)
GSI (Global Secondary Index)AnytimeDifferent partition keyYes (own RCU/WCU)

⚠️ Exam trap: “Query by different attribute” → add GSI. “Alternative sort key, same partition” → LSI (must define at creation).

Data Types:

⚠️ Exam trap: “Schema must rapidly evolve” or “flexible schema” → DynamoDB (NoSQL). RDS requires schema migrations.

Read/Write Capacity Modes:

ModeCapacity PlanningPricingBest For
Provisioned (default)Specify RCU/WCU upfrontPay for provisionedPredictable workloads
On-DemandAutomatic, instant scaling2-3x more expensiveUnpredictable, steep spikes

⚠️ Exam trap: “Load increases from thousands to millions in < 1 minute” or “unpredictable steep spikes” → On-Demand Mode. Provisioned auto-scaling is too slow for sudden bursts.

⚠️ Exam trap: “Cost-effective” + mixed workloads → Match mode to pattern:


DynamoDB Accelerator (DAX):

What is DAX?

DAX Architecture:
┌─────────────┐
│ Application │
└──────┬──────┘
       ▼
┌─────────────────────────┐
│     DAX Cluster         │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │Cache│ │Cache│ │Cache│ │
│ └─────┘ └─────┘ └─────┘ │
└──────────┬──────────────┘
           ▼
┌─────────────────────────┐
│   Amazon DynamoDB       │
│   ┌───┐ ┌───┐ ┌───┐     │
│   │Tbl│ │Tbl│ │Tbl│     │
│   └───┘ └───┘ └───┘     │
└─────────────────────────┘

DAX vs ElastiCache:

Use CaseSolution
Cache individual objects, Query/Scan resultsDAX
Store aggregation results (computed data)ElastiCache
Application
    │
    ├── Aggregation Results ──────► ElastiCache
    │
    └── Individual objects ───────► DAX ──► DynamoDB
        Query & Scan cache

⚠️ Exam trap: “Cache DynamoDB reads” → DAX. “Store computed/aggregated results” → ElastiCache.

⚠️ Exam trap: “ProvisionedThroughputExceededException” + “hot keys/popular items” → DAX. Caches hot keys, offloads reads, prevents throughput errors. Increasing RCU alone won’t fix hot partition problem.

⚠️ Exam trap: “Migrate to Aurora/RDS” vs “Add DAX” → Choose DAX. Migration = dev effort, downtime risk, loses serverless benefits. DAX = no code changes, immediate fix, stays serverless.


DynamoDB Streams:

What are Streams?

Use Cases:

⚠️ Exam trap: “React to DynamoDB changes” (e.g., “send email when user signs up”) → DynamoDB Streams + Lambda. Never poll/scan — use event-driven streams.

DynamoDB Streams Architecture:

App ──► Table ──► DynamoDB Streams ──┬──► Lambda/KCL ──► SNS (notifications)
                         │           │                  ──► DDB Table (filtering)
                         │           │
                         ▼           │
                  Kinesis Data   ────┴──► Kinesis Firehose ──► S3 (archiving)
                  Streams                                  ──► Redshift (analytics)
                                                           ──► OpenSearch (indexing)

DynamoDB Streams vs Kinesis Data Streams:

FeatureDynamoDB StreamsKinesis Data Streams
Retention24 hours1 year
ConsumersLimited (2 simultaneous)High # of consumers
ProcessingLambda Triggers, KCL AdapterLambda, Analytics, Firehose, Glue
OrderingPer-item orderedPer-shard ordered
CostIncluded (no extra charge)Pay for shards

When to Use Which:

ScenarioBest Choice
Simple Lambda trigger on DDB changesDynamoDB Streams
Need > 2 consumers reading same streamKinesis Data Streams
Retention > 24 hours neededKinesis Data Streams
Archive to S3/Redshift/OpenSearchKinesis Data Streams → Firehose
Real-time analytics on changesKinesis Data Streams → Analytics
Just trigger notifications/updatesDynamoDB Streams → Lambda

⚠️ Exam trap: “Multiple consumers” or “long retention” or “replay” or “analytics pipeline” or “GB/sec real-time” → Kinesis Data Streams. “Simple Lambda trigger” → DynamoDB Streams. SQS/SNS have no replay.


DynamoDB Global Tables:

What are Global Tables?

                    GLOBAL TABLE
    ┌─────────────────────────────────────────┐
    │                                         │
    │   ┌──────────┐  two-way  ┌──────────┐   │
    │   │  Table   │◄────────►│  Table   │   │
    │   │US-EAST-1 │replication│AP-SE-2  │   │
    │   └──────────┘           └──────────┘   │
    │    Read+Write            Read+Write     │
    └─────────────────────────────────────────┘

Key Points:

Global Tables vs RDS Read Replicas:

AspectDynamoDB Global TablesRDS Read Replicas
WriteAny region (active-active)Primary only
ReadAny regionAny replica
ReplicationTwo-way (bi-directional)One-way (primary → replica)
Use caseGlobal apps, DRRead scaling

⚠️ Exam trap: “Low latency global access to DynamoDB” → Global Tables. Requires Streams enabled (Streams provide changelog for replication). Not DAX (caching), not Backups (recovery), not “Versioning” (doesn’t exist). ⚠️ Exam trap: Global Tables = active-active (write anywhere). RDS Read Replicas = active-passive (write to primary only).


DynamoDB Additional Features:

Time To Live (TTL):

⚠️ Exam trap: “Web session handling” + “auto-expire” → DynamoDB with TTL. Sessions stored in DynamoDB, TTL auto-cleans expired sessions.

Backups:

TypeDetails
PITR (Point-in-Time Recovery)Last 35 days, continuous, creates new table
On-DemandManual, long-term retention, no performance impact
AWS BackupCross-region copy support

S3 Integration:

OperationDetails
Export to S3Requires PITR, last 35 days, DynamoDB JSON or ION format, no RCU consumed
Import from S3CSV/JSON/ION, creates new table, no write capacity consumed

⚠️ Exam trap: “Export DynamoDB for analytics” → Export to S3 (native feature). Not Lambda — Export uses PITR backup, no RCU, no code. Transfer Family/DataSync are for files, not databases.


AWS API Gateway:

Overview:

Features:

Integrations:

Integration TypeUse CaseExample
LambdaServerless backendREST API → Lambda
HTTPExisting HTTP endpointsOn-prem API, ALB
AWS ServiceDirect AWS API exposureStart Step Function, post to SQS

⚠️ Exam trap: “Serverless REST API” → API Gateway + Lambda. Why others fail:

API Gateway → Kinesis Data Streams Example:

Client ──► API Gateway ──► Kinesis Data ──► Kinesis Data ──► S3
           (requests)      Streams          Firehose         (.json files)

Endpoint Types:

TypeDescriptionCloudFront
Edge-Optimized (default)Global clients, routed via CloudFront edgeBuilt-in
RegionalSame-region clientsOptional (manual)
PrivateVPC only, via Interface Endpoint (ENI)N/A

⚠️ Exam trap: “Edge-Optimized API Gateway lives in all regions” = False. Requests route through global CloudFront edges, but API Gateway itself stays in ONE region.

Security:

MethodUse Case
IAM RolesInternal applications
CognitoExternal users (mobile apps)
Custom AuthorizerYour own auth logic (Lambda)

API Gateway Limits:

LimitValue
Throttling10,000 req/sec (account level, can increase)
Burst5,000 concurrent requests
Timeout29 seconds max (Lambda can run 15 min, but API GW times out at 29s)
Payload10 MB max

⚠️ Exam trap: “API Gateway timeout” = 29 seconds (not Lambda’s 15 min). Long-running → use async pattern (API GW → SQS → Lambda).

HTTPS/Certificates:

⚠️ Exam trap: “API Gateway + global users” → Edge-Optimized. Certificate must be in us-east-1.


AWS Step Functions:

Overview:

Use Cases:

Step Functions Workflow Types:

TypeDurationExecutionPricingUse Case
StandardUp to 1 yearExactly-oncePer state transitionLong-running, audit
ExpressUp to 5 minAt-least-oncePer execution + durationHigh-volume, short

⚠️ Exam trap: “Serverless workflow” + “human approval” → Step Functions. Only service with built-in human approval feature.

⚠️ Exam trap: “High-volume, short-lived workflows” → Express Workflows. “Long-running, exactly-once” → Standard Workflows.


Amazon Cognito:

Overview:

Cognito vs IAM:

AspectCognitoIAM
UsersHundreds/thousands/millionsHandful (employees, services)
TypeExternal users (customers)Internal users (admins, devs)
ScaleWeb/mobile app usersAWS account management
FederationSAML, social (Google, FB)SAML, OIDC (for roles)

⚠️ Exam trap keywords → Cognito:

Two Components:

ComponentPurposeKey Feature
User Pools (CUP)Authentication (sign-in)Serverless user database
Identity PoolsAuthorization (AWS credentials)Temporary AWS access

Cognito User Pools (CUP):

Features:

⚠️ Exam trap: “Easiest/best way to add authentication” to serverless app → Cognito User Pools. Not DynamoDB/S3 + KMS (DIY auth = complex), not Secrets Manager (for app secrets, not user auth).

Integrations:

      [CUP + API Gateway]                      [CUP + ALB]

      Cognito User Pools                    Cognito User Pools   
   (authenticate, get token)                  (authenticate)
              ▲                                     ▲ 
              │                                     │
              ▼                                     ▼
User ──► API Gateway ──► Lambda        User ─────► ALB ──► Target Group
     (REST API + token)                       (authenticate)
  (evaluate Cognito token)

⚠️ Exam trap: CUP integrates with API Gateway and ALB for authentication.


Cognito Identity Pools (Federated Identity):

Purpose:

Identity Sources:

Cognito Identity Pools Flow:

Web/Mobile App ──► Identity Provider ──► Cognito Identity Pools ──► AWS Services
                  (Google, Facebook,      (validate, exchange     (S3, DynamoDB)
                   SAML, CUP)             for AWS credentials)
                                                │
                                          IAM policies define
                                          what user can access

Key Points:

⚠️ Exam trap: “Mobile app needs direct access to S3/DynamoDB” → Cognito Identity Pools (provides temporary AWS credentials).

⚠️ Exam trap: “Per-user personal space in S3” → Cognito Identity Pools + IAM policy variables. Not IAM users (doesn’t scale), not public bucket (no security).

⚠️ Exam trap: User Pools = WHO you are (authentication). Identity Pools = WHAT you can access (authorization/credentials).


Serverless Architecture Use Case: Mobile App

Requirements → Solution Mapping:

RequirementAWS Solution
REST API with HTTPSAPI Gateway
Serverless architectureLambda, DynamoDB, Cognito, S3
Users interact with own S3 folderCognito Identity Pools (per-user IAM policy)
Managed serverless authenticationCognito User Pools
Mostly reads, some writesDAX (caching layer for read throughput)
Database scales, high read throughputDynamoDB + DAX

Complete Architecture:

                                    ┌──────────┐
                   Store/retrieve   │    S3    │
                   files ──────────►│ (files)  │
                        │           └──────────┘
                   Permissions
                   (Cognito)
                        │
┌────────────┐    REST HTTPS    ┌─────────────┐      ┌────────┐     ┌─────┐     ┌──────────┐
│   Mobile   │◄────────────────►│ API Gateway │─────►│ Lambda │────►│ DAX │────►│ DynamoDB │
│   Client   │                  │  (caching)  │      │        │     │cache│     │          │
└────────────┘                  └──────┬──────┘      └────────┘     └─────┘     └──────────┘
       │                               │
       │ authenticate                  │ verify auth
       ▼                               ▼
                              ┌─────────────────┐
                              │ Amazon Cognito  │
                              │ (User Pools +   │
                              │  Identity Pools)│
                              └─────────────────┘

Why each service:

⚠️ Exam trap: “Read-heavy workload” + “DynamoDB” → add DAX for caching. “Per-user S3 access” → Cognito Identity Pools.


Serverless Architecture Use Case: Global Website

Requirements → Solution Mapping:

RequirementAWS Solution
Scale globallyCloudFront (CDN, edge locations)
Rarely written, often readDynamoDB + DAX (caching)
Static filesS3 + CloudFront
Dynamic REST APIAPI Gateway + Lambda
Caching where possibleCloudFront (static) + DAX (DB reads)
Welcome email on signupDynamoDB Streams + Lambda + SES
Thumbnail on photo uploadS3 trigger + Lambda

Architecture Overview:

STATIC CONTENT (Global):
                                     OAC: Origin Access Control
Client ◄───────► CloudFront ◄──────────────────► S3 (static files)
            (edge locations)                     Bucket policy: only CloudFront

DYNAMIC API:
Client ◄──REST──► API Gateway ──► Lambda ──► DAX ──► DynamoDB
                                              cache

PHOTO UPLOAD + THUMBNAIL:
Client ──► CloudFront ──► S3 (photos) ──► Lambda (trigger) ──► S3 (thumbnails)
      (Transfer Acceleration)    OAC              │
                                                  ▼ optional
                                              SQS / SNS

WELCOME EMAIL:
DynamoDB ──► DynamoDB Streams ──► Lambda ──► SES (send email)
(new user)                        (trigger)

Key Patterns:

PatternImplementation
Static hostingS3 + CloudFront + OAC (bucket only allows CloudFront)
Global distributionCloudFront edge locations
Read-heavy DBDynamoDB + DAX caching
Event-driven processingS3 trigger → Lambda (thumbnails)
React to DB changesDynamoDB Streams → Lambda (welcome email)
Fast uploadsCloudFront + S3 Transfer Acceleration

OAC (Origin Access Control):

⚠️ Exam trap: “Static website + global” → S3 + CloudFront. “Secure S3 from direct access” → OAC (Origin Access Control).

⚠️ Exam trap: “Generate thumbnail on upload” → S3 event → Lambda. “Welcome email on signup” → DynamoDB Streams → Lambda → SES.

Summary — Serverless Website Key Points:

ComponentPurpose
CloudFront + S3Static content distribution
API Gateway + LambdaServerless REST API (public, no Cognito needed)
DynamoDB Global TablesGlobal data serving (alternative: Aurora Global)
DynamoDB Streams → LambdaReact to DB changes (new user → welcome email)
Lambda + SESServerless email sending (Lambda needs IAM role for SES)
S3 EventsTrigger SQS / SNS / Lambda on upload

⚠️ Exam trap: “Public API” → no Cognito needed, just API Gateway + Lambda. “Global database” → DynamoDB Global Tables or Aurora Global Database.


Microservices Architecture

Why Microservices?

Communication Patterns:

PatternServicesUse Case
SynchronousAPI Gateway, Load BalancersDirect request/response
AsynchronousSQS, Kinesis, SNS, Lambda triggers (S3)Decoupled, event-driven

Architecture Example:

                          Route 53 (DNS)
                               │
           ┌───────────────────┼───────────────────┐
           ▼                   ▼                   ▼
   service1.example.com  service2.example.com  service3.example.com
           │                   │                   │
           ▼                   ▼                   ▼
    ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
    │     ELB     │     │ API Gateway │     │     ELB     │
    └──────┬──────┘     └──────┬──────┘     └──────┬──────┘
           ▼                   ▼                   ▼
    ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
    │     ECS     │     │   Lambda    │     │ EC2 + ASG   │
    └──────┬──────┘     └──────┬──────┘     └──────┬──────┘
           ▼                   ▼                   ▼
    ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
    │  DynamoDB   │     │ ElastiCache │     │     RDS     │
    └─────────────┘     └─────────────┘     └─────────────┘

Microservices Challenges:

ChallengeDescription
Repeated overheadCreating each new microservice requires setup
Server utilizationHard to optimize density across services
Version complexityRunning multiple versions simultaneously
Client SDK proliferationClients need to integrate with many services

How Serverless Helps:

ChallengeServerless Solution
OverheadAPI Gateway + Lambda = minimal setup
ScalingAutomatic scaling, pay per usage
EnvironmentsClone API, reproduce environments easily
Client SDKsGenerate SDK through Swagger/OpenAPI integration

⚠️ Exam trap: “Reduce microservices overhead” → API Gateway + Lambda. “Generate client SDK” → API Gateway + Swagger/OpenAPI.


Software Updates Offloading

Problem:

Solution: Add CloudFront

BEFORE (expensive):
Users ──► EC2 (ASG) ──► distributes updates
          scales up    high CPU, network cost

AFTER (optimized):
                                     ┌────────────────────────────────┐
                                     │      Auto Scaling group        │
                                     │  ┌────────────────────────┐    │
                                     │  │   Availability Zone 1  │    │
                                     │  │   ┌────┐    ┌────┐     │    │
                                     │  │   │ M5 │    │ M5 │     │    │
Users ──► CloudFront ──► ALB ────────┼──┤   └────┘    └────┘     │    │──► EFS
          (edge cache)   (AZ 1-3)    │  ├────────────────────────┤    │  (shared storage)
          handles load               │  │   Availability Zone 2  │    │
                                     │  │   ┌────┐    ┌────┐     │    │
                                     │  │   │ M5 │    │ M5 │     │    │
                                     │  │   └────┘    └────┘     │    │
                                     │  ├────────────────────────┤    │
                                     │  │   Availability Zone 3  │    │
                                     │  │   ┌────┐               │    │
                                     │  │   │ M5 │               │    │
                                     │  │   └────┘               │    │
                                     │  └────────────────────────┘    │
                                     └────────────────────────────────┘

Why CloudFront Works:

BenefitExplanation
No architecture changesJust add CloudFront in front
Edge cachingSoftware files cached globally
Static contentUpdate files don’t change = perfect for CDN
EC2 not serverless, CloudFront isCloudFront scales automatically
Cost savingsLess ASG scaling, less EC2, less bandwidth

EFS for Multi-AZ Shared Storage:

⚠️ Exam trap: “Reduce EC2 load for static file distribution” + “no architecture changes” → CloudFront. Works with existing EC2, caches at edge, reduces origin load. ALB has no caching feature.

⚠️ Exam trap: “Multi-AZ EC2 + shared filesystem” → EFS. Not EBS (single AZ), not S3 (object storage, not filesystem).



🎯 MASTER SUMMARY: Serverless Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: Serverless = No Server Management, Not “No Servers”

Serverless means AWS manages infrastructure. You don’t provision, patch, or scale servers.

Derive: If question asks “provision instance” for DynamoDB/Lambda → Wrong answer.

Principle 2: Lambda Has Hard Limits — Know the Disqualifiers

Lambda is NOT suitable for every workload. Hard limits exist:

Derive: Any job > 15 min, > 10 GB RAM/disk, or needs GPU → Lambda is wrong. AWS Batch is the natural alternative for batch workloads.

Principle 3: Stateless vs Stateful — Lambda Has No Memory

Lambda functions are stateless. Each invocation is independent.

Derive: “Cache results between invocations” → needs external cache (DAX, ElastiCache).

Principle 4: IAM for APIs, Security Groups for Networks

Derive: “Lambda can’t write to SQS” → IAM role issue, not security groups. SQS has no SG.

Principle 5: Sync vs Async — Changes Everything

PatternBehaviorRetryUse Case
SyncWait for responseNo auto-retryAPI calls, user-facing
AsyncFire and forgetAuto-retry 6 hrsBackground jobs, events

Derive: “Retry on failure” → Async invocation or SQS. “Immediate response” → Sync.

Principle 6: Retention Determines Service Choice

ServiceRetentionReplay
SNSNone (push & forget)
SQS14 days max❌ (deleted after read)
Kinesis1-365 days✅ Multiple consumers
DynamoDB Streams24 hours

Derive: “Pause for a day, resume later” → SQS. “Replay events” → Kinesis. “Multiple consumers” → Kinesis.

Principle 7: Edge vs Region — Where Code Runs

Derive: “Auth at edge before reaching origin” → Lambda@Edge. “Millions req/sec simple” → CloudFront Functions.

Principle 8: Caching Layers Stack

User → CloudFront (edge cache) → API Gateway (response cache) → Lambda → DAX → DynamoDB

Each layer reduces load on the next. Know which service provides which cache.

Principle 9: Event-Driven = Streams + Triggers

React to changes without polling:

Derive: “Send email when user signs up” → DynamoDB Streams + Lambda + SES. Not polling.

Principle 10: Authentication vs Authorization

Cognito ComponentWhat It Does
User PoolsWHO you are (authentication, tokens)
Identity PoolsWHAT you can access (temporary AWS creds)

Derive: “Mobile app login” → User Pools. “Direct S3 access from mobile” → Identity Pools.


Part 2: Decision Tree (Follow Keywords → Find Answer)

"Serverless" mentioned?
├── REST API → API Gateway + Lambda
├── Database → DynamoDB (NoSQL) or Aurora Serverless (SQL)
├── Workflow → Step Functions
├── Auth → Cognito
└── Containers → Fargate

"Long-running job" (> 15 min)?
├── Batch workload → AWS Batch (Docker, Spot, no time limit)
├── Always-on service → ECS/Fargate
├── Custom/legacy → EC2
└── < 15 min → Lambda OK

"Cold start" problem?
├── Java/Python/.NET → SnapStart (free)
└── All languages → Provisioned Concurrency (costs $)

"Cache DynamoDB reads"?
├── Individual objects → DAX
└── Aggregated/computed → ElastiCache

"Global distribution"?
├── Static content → S3 + CloudFront
├── Dynamic API → API Gateway (Edge-Optimized) + Lambda
├── Database → DynamoDB Global Tables or Aurora Global

"React to changes"?
├── S3 upload → S3 Event → Lambda
├── DynamoDB insert → DynamoDB Streams → Lambda
├── Multiple consumers/replay → Kinesis

"Per-user S3 folders"?
└── Cognito Identity Pools + IAM policy variables

The CANNOT List:

WhatWhy
Lambda > 15 minHard limit
Lambda > 10 GB RAMHard limit
Lambda > 10 GB diskHard limit
Lambda GPUNot supported → AWS Batch
Lambda arbitrary DockerMust implement Runtime API
API Gateway > 29 secTimeout limit
DynamoDB change LSI after creationLSI defined at table creation
SNS message replayNo retention
SQS message replayDeleted after processing
ALB cachingALB has no cache
SQS security groupsAPI-based, no network access control

Part 3: Scenario Pattern Recognition

Pattern: “Video encoding takes 25+ minutes”

Keywords: video, encoding, > 15 min, long-running Answer: SQS + EC2 (or ECS/Batch) Why: Lambda max 15 min. SQS provides retry + retention up to 14 days.


Pattern: “Send welcome email when user signs up”

Keywords: welcome email, new user, react to signup Answer: DynamoDB Streams → Lambda → SES Why: Event-driven. Streams capture new items, Lambda sends email via SES.


Pattern: “Thumbnail generation on image upload”

Keywords: thumbnail, upload, S3, image processing Answer: S3 Event → Lambda → S3 (thumbnails) Why: S3 triggers Lambda on PutObject. Lambda processes and saves.


Pattern: “Reduce EC2 load for static file distribution”

Keywords: static files, reduce load, no architecture changes Answer: CloudFront (CDN) Why: Edge caching, no origin changes needed. ALB has no caching.


Pattern: “Mobile app needs direct S3/DynamoDB access”

Keywords: mobile, direct access, temporary credentials Answer: Cognito Identity Pools Why: Provides temporary AWS credentials with IAM policies.


Pattern: “Per-user personal folder in S3”

Keywords: per-user, personal space, S3 folders Answer: Cognito Identity Pools + IAM policy variables Why: ${cognito-identity.amazonaws.com:sub} in policy restricts to user’s folder.


Pattern: “Read-heavy workload with DynamoDB”

Keywords: read-heavy, DynamoDB, cache, hot keys Answer: DAX Why: In-memory cache, microsecond latency, no code changes.


Pattern: “ProvisionedThroughputExceededException on hot keys”

Keywords: throughput exceeded, hot partition, popular items Answer: DAX Why: Caches hot keys, offloads reads. RCU increase alone doesn’t fix hot partition.


Pattern: “Unpredictable, steep traffic spikes (0 to millions)”

Keywords: unpredictable, millions, instant scaling, spikes Answer: DynamoDB On-Demand mode Why: Instant scaling. Provisioned auto-scaling is gradual.


Pattern: “Global low-latency database access”

Keywords: global, multi-region, low latency, DynamoDB Answer: DynamoDB Global Tables Why: Active-active replication. Requires Streams enabled.


Pattern: “Human approval in workflow”

Keywords: human approval, manual step, workflow Answer: Step Functions Why: Built-in human approval feature. No other service has it.


Pattern: “Generate client SDK for API”

Keywords: client SDK, API, mobile/web developers Answer: API Gateway + Swagger/OpenAPI Why: API Gateway generates SDKs from OpenAPI specs.


Pattern: “Authenticate at CloudFront edge”

Keywords: edge authentication, before origin, CDN auth Answer: Lambda@Edge Why: Runs at edge, can validate JWT/tokens before hitting origin.


Pattern: “Multiple consumers need same stream data”

Keywords: multiple consumers, replay, analytics pipeline Answer: Kinesis Data Streams Why: Multiple consumers, 1-365 day retention, replay capability.


Pattern: “Millions requests/sec, simple header manipulation”

Keywords: millions, simple, headers, URL rewrite Answer: CloudFront Functions Why: Sub-millisecond, JavaScript only, cheaper than Lambda@Edge.


Pattern: “Long-running batch job, cost optimization”

Keywords: batch, long-running, hours, cost-effective, Spot Answer: AWS Batch with Spot Instances Why: No time limit, Docker flexibility, up to 90% savings with Spot.


Pattern: “Video/media transcoding at scale”

Keywords: video, transcoding, encoding, media processing Answer: AWS Batch (or Elastic Transcoder/MediaConvert) Why: Variable duration (could be hours), Docker flexibility, Spot for cost.


Part 4: Quick Reference Tables

Lambda Limits:

LimitValue
Timeout15 min (900 sec)
RAM128 MB - 10 GB
/tmp disk512 MB - 10 GB
Deployment (zip)50 MB compressed, 250 MB uncompressed
Concurrency1000 default (regional)
Layers5 per function

AWS Batch Capabilities (vs Lambda):

CapabilityAWS BatchLambda
Time limitUnlimited15 min
RAM100s of GB10 GB
DiskEBS (TBs)10 GB
GPU✅ Yes❌ No
Spot pricing✅ Yes (90% savings)❌ No
DockerAny imageRuntime API required
Startup timeMinutesMilliseconds

API Gateway Limits:

LimitValue
Timeout29 seconds
Throttle10,000 req/sec (account)
Payload10 MB

DynamoDB Numbers:

MetricValue
Item size max400 KB
Streams retention24 hours
On-Demand cost2-3x Provisioned
DAX TTL default5 minutes
PITR window35 days

Retention Comparison:

ServiceRetention
SNS0 (immediate delivery)
SQS1 min - 14 days
Kinesis1 - 365 days
DynamoDB Streams24 hours

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“Serverless REST API”API Gateway + Lambda
“Job > 15 minutes”NOT Lambda → EC2/ECS/Batch
“Cold start”SnapStart or Provisioned Concurrency
“Cache DynamoDB”DAX
“Cache aggregated results”ElastiCache
“React to DynamoDB changes”DynamoDB Streams + Lambda
“React to S3 upload”S3 Event + Lambda
“Global static website”S3 + CloudFront
“Global DynamoDB”Global Tables (needs Streams)
“Send email serverless”Lambda + SES
“Per-user S3 folders”Cognito Identity Pools
“Mobile app auth”Cognito User Pools
“Mobile direct AWS access”Cognito Identity Pools
“Workflow with human approval”Step Functions
“Generate client SDK”API Gateway + Swagger
“Edge authentication”Lambda@Edge
“Millions req/sec simple”CloudFront Functions
“Multiple stream consumers”Kinesis Data Streams
“Replay events”Kinesis Data Streams
“Pause/resume days later”SQS (14 day retention)
“Reduce EC2 load, no changes”CloudFront
“Share code between Lambdas”Lambda Layers
“Route Lambda success/failure”Lambda Destinations
“High-volume short workflows”Step Functions Express
“Long audit workflows”Step Functions Standard
“Query by different attribute”DynamoDB GSI
“Steep instant scaling”DynamoDB On-Demand
“Predictable steady load”DynamoDB Provisioned
“Lambda timeout 3 sec”Default not changed
“Lambda can’t reach RDS”Configure Lambda in VPC
“Lambda can’t write to SQS”IAM Role missing permissions
“Long-running batch job”AWS Batch
“Cost-optimize batch processing”AWS Batch + Spot
“GPU required”AWS Batch (not Lambda)
“> 10 GB RAM/disk”AWS Batch (not Lambda)
“Video/media transcoding”AWS Batch or MediaConvert
“ETL, data processing hours”AWS Batch

Part 6: Elimination Checklist

□ Does it need > 15 min execution?
  → Yes = Eliminate Lambda → AWS Batch preferred for batch jobs
  → No = Lambda possible

□ Does it need > 10 GB RAM or disk?
  → Yes = Eliminate Lambda → AWS Batch
  → No = Lambda possible

□ Does it need GPU?
  → Yes = Eliminate Lambda → AWS Batch (GPU instances)

□ Is it "serverless REST API"?
  → API Gateway + Lambda (not ALB+EC2)

□ Does it mention "cache"?
  → DynamoDB reads = DAX
  → Aggregated data = ElastiCache
  → Static content = CloudFront
  → API responses = API Gateway caching
  
□ Does it mention "global"?
  → Static = CloudFront
  → Database = Global Tables / Aurora Global
  
□ Does it need "replay" or "multiple consumers"?
  → Kinesis (not SQS/SNS)

□ Does it mention "edge"?
  → Simple/fast = CloudFront Functions
  → Complex/network = Lambda@Edge

□ "Security group" for SQS/SNS/DynamoDB?
  → Wrong answer (API services, not network)

□ "Provision instance" for DynamoDB/Lambda?
  → Wrong answer (serverless)

🏆 The Golden Rules

  1. 15/10/10 Rule — Lambda max: 15 min, 10 GB RAM, 10 GB disk → AWS Batch if exceeded
  2. 29 sec API Gateway — API Gateway times out before Lambda (use async for long jobs)
  3. DAX for reads, ElastiCache for compute — know which cache layer
  4. Streams enable replication — Global Tables require DynamoDB Streams
  5. User Pools = Auth, Identity Pools = Creds — Cognito split
  6. Edge-Optimized cert in us-east-1 — API Gateway + CloudFront
  7. Lambda@Edge authored in us-east-1 — replicated globally
  8. SNS = no retention, SQS = 14 days, Kinesis = 365 days
  9. IAM for APIs, SG for networks — SQS/SNS/DynamoDB = no security groups
  10. CloudFront + existing EC2 = no refactor — just add CDN in front
  11. On-Demand = 2-3x cost but instant scale — DynamoDB mode trade-off
  12. SnapStart = free cold start fix — but only Java/Python/.NET
  13. Step Functions = only human approval — no other serverless workflow has it
  14. S3 + CloudFront + OAC — secure static hosting pattern
  15. DynamoDB Streams → Lambda → SES — serverless email pattern
  16. AWS Batch = Lambda alternative — when limits exceeded (time, RAM, disk, GPU)
  17. Batch + Spot = 90% savings — for cost-optimized batch processing

Amazon Lightsail:

Amazon Lightsail simplified alternative version of AWS services, used for simple web applications (has templates for LAMP, Nginx, MEAN, Node.js..), websites (templates for Wordpress, Magento, Plesk, Joomla), Dev/Test environment. Has high availability but no auto-scaling, limited AWS integrations.

Pricing:

Pricing Models in AWS:

Examples of spending categories:

Free services & free tier in AWS:

EC2 Instances Purchasing Options:

EC2 Image Builder only pay for the underlying resources.

EBS Storage billed:

EFS (Elastic File System):

S3 Pricing:

ECS pricing:

Lambda pricing:

Snowball Family Pricing: AWS Snowball offers significantly discounted pricing (up to 62%) for 1-year usage and 3-year usage commitments for Edge compute use cases.

Database pricing - RDS:

CloudFront pricing:

Billing and Costing Tools:

Pricing Calculator: estimate the cost for your solution architecture.

AWS Billing Dashboard: home page for an overview of your AWS cloud financial management data and to help you make faster and more informed decisions. AWS Free Tier Dashboard: tracking AWS Free Tier usage.

Cost Allocation Tags: use cost allocation tags to track AWS costs on a detailed level.

Tagging and Resource Groups:

Cost and Usage Reports: lists AWS usage for each service category used by an account and its IAM users in hourly or daily line items, as well as any tags that customer activated for cost allocation purposes, including additional metadata about AWS services, pricing and reservations.

Cost Explorer: visualize, understand, and manage your AWS costs and usage over time. Create custom reports that analyze cost and usage data.

Billing Alarms in CloudWatch: intended simple alarm for actual cost, not for projected costs, based on billing data metric stored in CloudWatch.

Create billing alert for free tier (Details):

  1. (Change region to N.Virginia) [ Alert preferences ] > [ Edit ] > Receive CloudWatch billing alerts [x] > [ Save ];
    https://us-east-1.console.aws.amazon.com/billing/home#/preferences
  2. Open the CloudWatch console » Alarms > All alarms > Create alarm > Select metric > Billing > Total Estimated Charge.
    https://console.aws.amazon.com/cloudwatch/

AWS Budgets: set custom budgets to track your costs and usage, and respond quickly to alerts received from email or SNS notifications if you exceed your threshold.

AWS Cost Anomaly Detection: continuously monitor your cost and usage using ML to detect unusual spends. It learns your unique, historic spend patterns to detect one-time cost spike and/or continuous cost increases — no need to define thresholds (ML does it). Monitor by: AWS services, member accounts, cost allocation tags, or cost categories. Sends anomaly detection report with root-cause analysis. Get notified with individual alerts or daily/weekly summary via SNS.

AWS Service Quotas: notifies you when you’re close to a service quota value threshold. Create CloudWatch Alarms. Request a quota increase from AWS Service Quotas or shutdown resources before limit is reached.

AWS Trusted Advisor: analyze your AWS accounts and provides recommendation on 6 categories:

AWS Compute Optimizer: uses ML to analyze existing resources’ configurations and their utilization CloudWatch metrics, helps to choose optimal configurations and right-size your workloads (over/under provisioned). Supports: EC2 Instances, EC2 Auto Scaling Groups, EBS volumes, Lambda functions. Recomendations can be exported to S3.

⚠️ Exam trap — Cost Explorer vs Compute Optimizer:

AWS Support Plans Pricing:

AWS Basic Support (free):

AWS Developer Support Plan:

AWS Business Support Plan (24/7):

AWS Enterprise On-Ramp Support Plan (24/7):

AWS Enterprise Support Plan (24/7):



🎯 MASTER SUMMARY: Billing, Costing & Support Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: Cost Tools Differ by WHEN You Need Them

Principle 2: Alerts Have Different Triggers

Principle 3: Trusted Advisor = 6 Categories, Tier-Gated

Free tier gets 7 core checks only. Full checks require Business or Enterprise support plan. Categories: Cost optimization, Performance, Security, Fault tolerance, Service limits, Operational Excellence.

Principle 4: Support Plans Are a Spectrum

Basic (free) → Developer → Business → Enterprise On-Ramp → Enterprise. Key differentiators: response time, TAM access, Trusted Advisor access. Business = first plan with 24/7 phone/chat + full Trusted Advisor + API access.

Principle 5: Tags Drive Cost Visibility

Cost Allocation Tags → track costs per project/team/environment. Resource Groups → view resources sharing common tags. Without tags, you can’t do granular cost analysis.


Part 2: Decision Trees

What cost question are you answering?
│
├─ "Estimate cost BEFORE building" → Pricing Calculator
├─ "Visualize/analyze PAST costs" → Cost Explorer
├─ "Set budget ALERTS" → AWS Budgets
├─ "Detect UNUSUAL spending (ML)" → Cost Anomaly Detection
├─ "Detailed cost AUDIT report" → Cost and Usage Reports
├─ "Right-size resources" → Compute Optimizer
└─ "Check best practices" → Trusted Advisor
Which Support Plan?
│
├─ "Just documentation + forums" → Basic (free)
├─ "Email support, business hours" → Developer
├─ "24/7 phone + full Trusted Advisor" → Business
├─ "Pool of TAMs, <30 min critical" → Enterprise On-Ramp
└─ "Designated TAM, <15 min critical" → Enterprise

Part 3: Scenario Pattern Recognition

Pattern: “Detect unexpected cost spikes without setting thresholds”

Keywords: unusual spending, ML, automatic detection Answer: AWS Cost Anomaly Detection Why: ML learns patterns — no manual thresholds. Sends root-cause analysis via SNS.


Pattern: “Get alerted when cost exceeds $X”

Keywords: budget, threshold, alert, notification Answer: AWS Budgets Why: Budgets support cost/usage/reservation thresholds with email/SNS alerts.


Pattern: “Forecast next 12 months of AWS spending”

Keywords: forecast, predict, future cost Answer: Cost Explorer (forecast feature)


Pattern: “Right-size EC2/Lambda/EBS resources”

Keywords: over-provisioned, under-utilized, right-size Answer: AWS Compute Optimizer


Pattern: “Need 24/7 phone support + full Trusted Advisor”

Keywords: production workloads, 24/7, phone support Answer: Business Support Plan (minimum for this)


Pattern: “Need a designated TAM”

Keywords: TAM, Technical Account Manager, designated Answer: Enterprise Support Plan (On-Ramp has a pool, not designated)


Part 4: Quick Reference Tables

ToolPurposeTrigger
Pricing CalculatorEstimate cost before buildingManual
Cost ExplorerVisualize past costs, forecast 12moOn-demand
AWS BudgetsAlert when approaching/exceeding thresholdThreshold-based
Cost Anomaly DetectionML detects unusual spendingAutomatic (ML)
Cost & Usage ReportsDetailed line-item auditScheduled
Compute OptimizerRight-size recommendationsML analysis
Trusted AdvisorBest practice checks (6 categories)Continuous
Support PlanResponse (Critical)Trusted AdvisorTAM
Basic7 core checks
Developer12h (business hrs)7 core checks
Business<1hFull + API
Enterprise On-Ramp<30 minFull + APIPool
Enterprise<15 minFull + APIDesignated

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“Estimate cost before building”Pricing Calculator
“Visualize past costs”Cost Explorer
“Forecast future spending”Cost Explorer (12 mo)
“Set budget alert”AWS Budgets
“Detect unusual spending (ML)”Cost Anomaly Detection
“No thresholds, automatic detection”Cost Anomaly Detection
“Detailed cost audit per service”Cost & Usage Reports
“Right-size EC2/EBS/Lambda”Compute Optimizer
“Best practices check”Trusted Advisor
“Track costs by project/team”Cost Allocation Tags
“24/7 phone support”Business plan (minimum)
“Full Trusted Advisor + API”Business plan (minimum)
“Designated TAM”Enterprise plan
“Pool of TAMs”Enterprise On-Ramp
“<15 min response critical”Enterprise plan
“Service quota approaching limit”Service Quotas
“Stop dev instances after hours”Instance Scheduler

Part 6: Elimination Checklist

□ Is it about ESTIMATING cost before building?
  → Yes = Pricing Calculator
  → No = analyzing existing costs

□ Is it about DETECTING unusual spending automatically?
  → Yes + no thresholds = Cost Anomaly Detection (ML)
  → Yes + specific threshold = AWS Budgets

□ Is it about VISUALIZING past costs or FORECASTING?
  → Visualize/forecast = Cost Explorer
  → Detailed line-item audit = Cost & Usage Reports

□ Do they need 24/7 PHONE support?
  → Yes = Business plan (minimum)
  → Email only = Developer plan

□ Do they need a TAM?
  → Designated = Enterprise
  → Pool = Enterprise On-Ramp
  → None = Business or lower

□ Do they need FULL Trusted Advisor?
  → Yes = Business plan (minimum)
  → 7 core checks only = Basic/Developer

□ Is it about RIGHT-SIZING resources?
  → Yes = Compute Optimizer
  → Cost visualization = Cost Explorer (different!)

□ Is it about TRACKING costs per project/team?
  → Yes = Cost Allocation Tags first
  → No tags = can't do granular tracking

🏆 The Golden Rules

  1. Pricing Calculator = before, Cost Explorer = after (estimate vs analyze)
  2. Budgets = you set threshold, Anomaly Detection = ML finds it (manual vs automatic)
  3. Business plan = minimum for 24/7 + full Trusted Advisor (common exam question)
  4. Enterprise = designated TAM, On-Ramp = pool of TAMs (key distinction)
  5. Tags first, then allocate costs (no tags = no granular cost tracking)
  6. Compute Optimizer ≠ Cost Explorer (right-size vs visualize)
  7. CloudWatch Billing Alarm = simple actual cost only (Budgets is more powerful)

Scalability and High Availability

Scalability means that an application or system can handle greater loads by adapting.

High Availability: survivability of a data center loss (disaster). Running application or system in at least two AZs.

Fault-tolerant systems emphasize maintaining continuous operation during unexpected failures, while high-availability infrastructures prioritize keeping services up and running despite scheduled maintenance or potential bottlenecks.

Scalability vs Elasticity vs Agility:

Elastic Load Balancer (ELB) - managed load balancer that automatically distributes incoming application traffic across multiple resources, such as Amazon EC2 instances.

Load Balancer Flows:

ALB Flow (Layer 7):
                         ┌─────────────────┐
    Internet             │       ALB       │
   ─────────────────────►│  SSL Termination│
    HTTPS :443           └────────┬────────┘
                                  │ HTTP :80
                    ┌─────────────┼─────────────┐
                    ▼             ▼             ▼
              ┌─────────┐   ┌─────────┐   ┌─────────┐
              │  EC2    │   │  EC2    │   │  EC2    │
              └─────────┘   └─────────┘   └─────────┘
                        Target Group

ALB Routing Rules:
              ┌──────────────────────────────────────┐
              │     ALB Listener :443 (HTTPS+SNI)    │
              └──────────────────┬───────────────────┘
          ┌──────────────────────┼──────────────────────┐
          │ /api/*               │ /images/*            │ default
          ▼                      ▼                      ▼
   ┌──────────────┐      ┌──────────────┐      ┌──────────────┐
   │  API Servers │      │  S3/Lambda   │      │  Web Servers │
   └──────────────┘      └──────────────┘      └──────────────┘

NLB with Static IP (Layer 4):
   Client (needs static IP for firewall whitelist)
                    │
                    ▼
         ┌─────────────────────┐
         │        NLB          │
         │  Elastic IP: 1.2.3.4│  ◄── Static IP per AZ
         │     (Layer 4)       │
         └──────────┬──────────┘
                    │ TCP passthrough
                    │ (Client IP preserved)
                    ▼
            ┌───────────────┐
            │  Target Group │
            └───────────────┘


GLB for Security Appliances:
                    ┌──────────┐
   Traffic ────────►│   GLB    │
                    │(Layer 3) │
                    └────┬─────┘
                         │ GENEVE :6081
                         ▼
              ┌─────────────────────┐
              │  Security Appliance │
              │  (Firewall/IDS/IPS) │
              │     Inspect ───────►│──► Allow/Block
              └─────────────────────┘
                         │
                         ▼
                   Your Application

Types of load balancers:

FeatureALBNLBGLBCLB
Layer7 (HTTP/S)4 (TCP/UDP)3 (IP)4 & 7
Use CaseWeb apps, microservicesUltra-low latency, static IPFirewalls, IDS/IPSLegacy (deprecated)
PerformanceModerateMillions req/secHigh throughputModerate
Static IP❌ DNS only✅ Elastic IP per AZ
SNI (multi-cert)N/A
Cross-Zone Default✅ Enabled (free)❌ Disabled (paid)❌ Disabled (paid)❌ Disabled (free)
HostnameXXX.region.elb.amazonaws.comXXX.region.elb.amazonaws.comXXX.region.elb.amazonaws.comFixed hostname

Target Group Support:

Target TypeALBNLBGLBCLB
EC2 Instances
IP Addresses (private)
Lambda Functions✅ (HTTP→JSON)
ALB
ECS Tasks

When to Use:

ScenarioChoose
HTTP routing (path/host/headers/query string)ALB
WebSockets, HTTP/2ALB
Containers with dynamic portsALB
Need static/Elastic IP (IP whitelisting)NLB
Millions req/sec, ultra-low latencyNLB
TCP/UDP non-HTTP trafficNLB
3rd party security appliancesGLB
Deep packet inspectionGLB

Details:

⚠️ Exam trap: ELB target registration — instance ID vs IP address

Register ByRouting BehaviorUse Case
Instance IDRoutes to primary private IP on primary ENIDefault, simplest
IP AddressRoutes to the specific IP you choseMultiple IPs per instance, non-EC2 targets (on-prem, containers)

Load Balancer Details:

FeatureCLBALBNLBGLB
Layer4 & 7 (deprecated)7 (HTTP/S)4 (TCP/UDP)3 (IP)
Use CaseLegacyMicroservices, containersUltra-low latency, static IPFirewalls, IDS/IPS
RoutingBasicPath/host/headers/query--
Target GroupsEC2 onlyEC2, ECS, Lambda, IPsEC2, IPs, ALBEC2, IPs
Static IP❌ DNS only✅ Elastic IP/AZ
Health ChecksTCP, HTTPHTTP, HTTPSTCP, HTTP, HTTPSTCP, HTTP, HTTPS
Dynamic Port Mapping
Client InfoPreservedX-Forwarded-* headersPreservedPreserved
ProtocolTCP/HTTPHTTP/HTTPSTCP/UDPGENEVE (port 6081)

Sticky Sessions (Session Affinity): client always redirected to same instance behind load balancer.

Cookie Types:

Cross-Zone Load Balancing: distributes traffic evenly across all registered instances in all AZs (not just per-node).

Load BalancerDefaultInter-AZ Data Charges
ALB✅ EnabledNo charges
NLB & GLB❌ DisabledCharges if enabled
CLB❌ DisabledNo charges

SSL/TLS: encrypts traffic in transit (in-flight encryption) between clients and load balancer.

Load Balancer - SSL Certificates:

HTTP → HTTPS Redirect:

⚠️ Exam trap: DNS cannot redirect HTTP→HTTPS (DNS only resolves names to IPs, no protocol handling).

SNI (Server Name Indication): allows multiple SSL certs on one server (multiple websites).

⚠️ Exam traps:

Load BalancerSSL CertificatesSNI Support
CLB1 only (need multiple CLBs for multiple domains)❌ No
ALBMultiple (via multiple listeners)✅ Yes
NLBMultiple (via multiple listeners)✅ Yes

Connection Draining / Deregistration Delay: time to complete in-flight requests while instance is de-registering or unhealthy.

Auto Scaling Group (ASG): ensures optimal capacity by automatically scaling EC2 instances.

Launch Template vs Launch Configuration:

FeatureLaunch Configuration (legacy)Launch Template (recommended)
Multiple instance types❌ Single type only✅ Multiple types
Mixed On-Demand + Spot
Versioning
Capacity Reservations
StatusLegacy (deprecated)Recommended

⚠️ Exam trap: “Mix On-Demand + Spot across multiple instance types in ASG” → Launch Template only. Launch Configuration supports single instance type, single purchase option. AWS recommends Launch Templates for all new ASGs.

ASG + ALB Integration:
┌─────────────────────────────────────────────────┐
│            Auto Scaling Group                   │
│  ┌───────┐   ┌───────┐   ┌───────┐             │
│  │  EC2  │   │  EC2  │   │  EC2  │   ...       │
│  └───┬───┘   └───┬───┘   └───┬───┘             │
│      └───────────┴───────────┘                 │
│              ▲ Health Checks                    │
│  ALB ────────┘                                  │
│                                                 │
│  Scale Out ◄── CloudWatch Alarm (CPU>70%)      │
│  Scale In  ◄── CloudWatch Alarm (CPU<30%)      │
└─────────────────────────────────────────────────┘

ASG Health Check Types:

TypeWhat it checksStatus
EC2Instance running (hardware/hypervisor)Always on
ELBApp responds on health endpointOptional (additive)

Unhealthy instance behavior: ASG terminates instance → launches new one.

⚠️ Exam trap: ASG never “restarts the app” or “detaches and leaves running” - always terminates + replaces.

Auto Scaling Groups - Capacity characteristics:

Auto Scaling Groups - Scaling Strategies: Manual Scaling: Update the size of an ASG manually; Dynamic Scaling: Respond to changing demand: - Simple / Step Scaling: threshold-based, you define actions; - Example: CPU > 70% → add 2 units; CPU < 30% → remove 1 unit; - Target Tracking Scaling: “keep metric at X” - ASG auto-adjusts (like thermostat); - Example: avg 1000 connections/instance, 70% CPU, 50 requests/target; - Scheduled Scaling: time-based, for predictable patterns; - Example: scale to 10 instances every Monday 9am, scale down Friday 6pm; - Predictive Scaling: ML-based, proactive; - Uses Machine Learning to predict future traffic ahead of time;

Custom Metrics for Scaling:

⚠️ Exam trap: “Detailed Monitoring” only increases EC2 metric frequency (1min vs 5min) - does NOT add new metric types. For app-specific metrics like “DB requests/min” → Custom Metric required.

Decoupling Services (Microservices Approach):

Why decouple? Synchronous communication can be problematic with sudden traffic spikes.

Application Communication Patterns:

1) Synchronous (app-to-app):        2) Asynchronous (app-to-queue-to-app):

┌─────────┐      ┌──────────┐       ┌─────────┐    ┌───────┐    ┌──────────┐
│ Buying  │◀────▶│ Shipping │       │ Buying  │───▶│ Queue │───▶│ Shipping │
│ Service │      │ Service  │       │ Service │    │       │    │ Service  │
└─────────┘      └──────────┘       └─────────┘    └───────┘    └──────────┘
   (tight coupling)                    (decoupled, scales independently)

Decoupling services:

These services scale independently from your application!

⚠️ Exam trap - “Services that buffer or throttle traffic spikes”:


Amazon SQS (Simple Queue Service) is fully managed, serverless messaging service that is used to decouple aplications. Send, store, and receive messages between software components, without losing messages or requiring other services to be available. In Amazon SQS, an application sends messages into a queue. A user or service retrieves a message from the queue, processes it, and then deletes it from the queue.

Amazon SNS is a fully managed, serverless, publish/subscribe notification service. Using Amazon SNS topics, a publisher publishes messages to subscribers:

Amazon MQ is a managed message broker service for RabbitMQ and ActiveMQ.

Amazon Kinesis is a managed service to collect, process and analyze real-time streaming data at any scale.


Amazon SQS – Standard Queue

• Oldest offering (over 10 years old) • Fully managed service, used to decouple applications • Attributes: • Unlimited throughput, unlimited number of messages in queue • Default retention of messages: 4 days, maximum of 14 days • Low latency (<10 ms on publish and receive) • Limitation of 256KB per message sent • Can have duplicate messages (at least once delivery, occasionally) • Can have out of order messages (best effort ordering)

SQS Queue - Multiple Producers & Consumers:

┌──────────┐                              ┌──────────┐
│ Producer │──┐                     ┌────▶│ Consumer │
└──────────┘  │                     │     └──────────┘
┌──────────┐  │    ┌───────────┐    │     ┌──────────┐
│ Producer │──┼───▶│ SQS Queue │────┼────▶│ Consumer │
└──────────┘  │    └───────────┘    │     └──────────┘
┌──────────┐  │   (Send messages)   │     ┌──────────┐
│ Producer │──┘                     └────▶│ Consumer │
└──────────┘                              └──────────┘
                                    (Poll messages)

Producing Messages:

Consuming Messages:

SQS Message Flow:

                Poll/Receive              Process
SQS Queue ─────────────────▶ Consumer ─────────────▶ RDS
    ▲                            │
    │         DeleteMessage      │
    └────────────────────────────┘

Multiple Consumers (Horizontal Scaling):

SQS with Auto Scaling Group:

                  SendMessage                    ReceiveMessages
┌────────────┐        │        ┌───────────┐         │        ┌────────────┐
│ Front-end  │────────┼───────▶│ SQS Queue │─────────┼───────▶│ Back-end   │
│ Web App    │        │        │(infinitely│         │        │ Processing │
│  (ASG)     │        │        │ scalable) │         │        │   (ASG)    │
└────────────┘        │        └─────┬─────┘         │        └────────────┘
                                     │
                    ┌────────────────┴────────────────┐
                    ▼                                 ▼
          CloudWatch Metric                   CloudWatch Alarm
    (ApproximateNumberOfMessages)                    │
                                                     ▼
                                              Scale ASG

⚠️ Exam trap: “Scale consumers based on queue depth” → use CloudWatch Alarm on ApproximateNumberOfMessages

At-Least-Once Delivery: SQS prioritizes never losing messages over exactly-once delivery. Duplicates can occur when:

  1. Consumer takes too long → visibility timeout expires → message reappears → another consumer gets it
  2. Internal replication timing (SQS stores across multiple servers)

Solution: Make consumers idempotent (processing same message twice = same result) or use FIFO queue (exactly-once)

⚠️ Exam trap: “Prevent duplicate processing in SQS” → use FIFO queue or idempotent consumers

[NEW INFO]


Amazon SQS – Security

Security LayerOptions
In-flight encryptionHTTPS API
At-rest encryptionKMS keys (SSE-KMS)
Client-side encryptionCustomer manages encrypt/decrypt
Access ControlsIAM policies for SQS API access
SQS Access PoliciesResource-based (like S3 bucket policies)

SQS Access Policies use cases:

⚠️ Exam trap: “Allow S3 to send notifications to SQS” → SQS Access Policy (not IAM policy)


Amazon SQS – Long Polling

Polling TypeBehaviorAPI CallsCost
Short PollingReturns immediately (even if empty)HighHigher (pay per request!)
Long PollingWaits up to 20 sec for messagesLowLower

Why Long Polling saves $$$: SQS pricing = per 1 million requests. Short polling = constant empty responses = wasted money.

Enable Long Polling:

⚠️ Exam trap: “Reduce SQS costs” or “reduce empty responses” → enable Long Polling


Amazon SQS – Message Visibility Timeout

Visibility Timeout Timeline:

  ReceiveMessage     ReceiveMessage     ReceiveMessage        ReceiveMessage
      │                   │                  │                      │
      ▼                   ▼                  ▼                      ▼
──────┬───────────────────────────────────────┬────────────────────────▶ Time
      │         Visibility Timeout            │
      │◄─────────────────────────────────────▶│
      │                                       │
   Message              Not returned          Message returned
   returned            (invisible)              (again!)
Visibility TimeoutRisk
Too low (seconds)Duplicates (message reappears before processing done)
Too high (hours)Long wait if consumer crashes (re-processing delayed)

Extend timeout: Call ChangeMessageVisibility API to get more time

⚠️ Exam trap: “Consumer needs more time” → use ChangeMessageVisibility API


Amazon SQS – FIFO Queue

FIFO Queue Flow:

┌──────────┐   Send messages    ┌─────────────┐   Poll messages   ┌──────────┐
│ Producer │───────────────────▶│ FIFO Queue  │──────────────────▶│ Consumer │
└──────────┘   [4][3][2][1]     │   ▶|||◀     │   [4][3][2][1]    └──────────┘
                                └─────────────┘
                              (same order in/out)
FeatureStandard QueueFIFO Queue
ThroughputUnlimited300 msg/s (3000 with batching)
OrderingBest effortGuaranteed (by Message Group ID)
DeliveryAt-least-onceExactly-once (Deduplication ID)
DuplicatesPossibleRemoved via Deduplication ID

FIFO Required Parameters:

ParameterTypePurposeExample
Message Group IDTagGroups messages for ordering. Same group = processed in ordercustomer_123 or order_456
Message Deduplication IDTokenPrevents duplicates. Same ID within 5 min = rejectedtxn_789 or hash of message body

Use Cases:

⚠️ Exam trap: “Need ordering” → FIFO queue. “Need exactly-once” → FIFO queue with Deduplication ID


SQS as a Buffer to Database Writes

Problem: Direct writes to DB under heavy load → transactions lost

Without SQS (transactions may be lost):

requests ───▶ ┌─────────────┐ ─── Insert ───▶ ┌─────────────┐
              │ Application │   transactions   │   RDS /     │
              │   (ASG)     │ ────────────────▶│   Aurora /  │
              └─────────────┘                  │   DynamoDB  │
                    │                          └─────────────┘
              (overwhelmed)
With SQS Buffer (no data loss):

              ┌─────────────┐                  ┌─────────────┐
requests ───▶ │ Enqueue App │  SendMessage     │ Dequeue App │   insert
              │    (ASG)    │ ───────────────▶ │    (ASG)    │ ──────────▶ DB
              └─────────────┘    ┌───────┐     └─────────────┘
                                 │  SQS  │
                                 │Queue  │
                                 │(buffer)│
                                 └───────┘
                            (infinitely scalable)

Use case: Protect database from write spikes, decouple producers from consumers

⚠️ Exam trap: “Database overwhelmed by writes” → use SQS as buffer

[NEW INFO]


Amazon SNS – Simple Notification Service

Pub/Sub model: One message to many receivers (vs SQS point-to-point)

Direct Integration vs Pub/Sub:

Direct (tight coupling):              Pub/Sub (decoupled):

┌─────────┐ ──▶ Email                 ┌─────────┐         ┌─────────────┐
│ Buying  │ ──▶ Fraud Service         │ Buying  │──▶ SNS ─┼──▶ Email    │
│ Service │ ──▶ Shipping              │ Service │   Topic │──▶ Fraud    │
│         │ ──▶ SQS Queue             └─────────┘         │──▶ Shipping │
└─────────┘                           (1 publish)         │──▶ SQS Queue│
(4 integrations to maintain)                              └─────────────┘
                                                         (add subscribers easily)

SNS Limits:

Subscribers: SQS, Lambda, Kinesis Data Firehose, Email, SMS, HTTP(S) endpoints

AWS Services → SNS (built-in integrations):

┌─────────────────────────────────────────────┐
│ CloudWatch Alarms  │  AWS Budgets  │ Lambda │      publish
│ ASG (Notifications)│  S3 (Events)  │ DynamoDB│ ───────────▶  SNS
│ CloudFormation     │  AWS DMS      │ RDS    │
│ (State Changes)    │ (New Replica) │ Events │
└─────────────────────────────────────────────┘

Amazon SNS – How to Publish

MethodUse CaseSteps
Topic Publish (SDK)Standard notificationsCreate topic → Create subscription(s) → Publish
Direct Publish (Mobile SDK)Mobile pushCreate platform app → Create endpoint → Publish

Mobile Push Platforms: Google GCM, Apple APNS, Amazon ADM

⚠️ Exam trap: “Send notification to multiple services at once” → SNS (not SQS)


SNS + SQS: Fan Out Pattern

Push once to SNS, receive in all SQS queues that are subscribers

SNS + SQS Fan Out:

┌─────────┐         ┌───────────┐         ┌───────────┐         ┌─────────────┐
│ Buying  │────────▶│ SNS Topic │────────▶│ SQS Queue │────────▶│ Fraud       │
│ Service │         └─────┬─────┘         └───────────┘         │ Service     │
└─────────┘               │                                     └─────────────┘
                          │               ┌───────────┐         ┌─────────────┐
                          └──────────────▶│ SQS Queue │────────▶│ Shipping    │
                                          └───────────┘         │ Service     │
                                                                └─────────────┘

Benefits:

Required: SQS queue access policy must allow SNS to write


SNS + SQS FIFO: Fan Out with Ordering

Need fan out + ordering + deduplication? Use SNS FIFO + SQS FIFO

SNS FIFO + SQS FIFO Fan Out:

┌─────────┐         ┌────────────┐         ┌────────────┐         ┌─────────┐
│ Buying  │────────▶│ SNS FIFO   │────────▶│ SQS FIFO   │────────▶│ Fraud   │
│ Service │         │ Topic      │         │ Queue      │         │ Service │
└─────────┘         └─────┬──────┘         └────────────┘         └─────────┘
                          │                ┌────────────┐         ┌─────────┐
                          └───────────────▶│ SQS FIFO   │────────▶│Shipping │
                                           │ Queue      │         │ Service │
                                           └────────────┘         └─────────┘

SNS FIFO Topic:


SNS – Message Filtering

JSON policy to filter messages per subscription (subscribers only get what they need)

SNS Message Filtering:

                    Message:
                    Order: 1036
┌─────────┐         Product: Pencil        ┌─────────────────────────────────────┐
│ Buying  │──────▶  State: Placed  ──────▶ │           SNS Topic                 │
│ Service │                                └──────────────┬──────────────────────┘
└─────────┘                                               │
                                                          │
                    ┌─────────────────────────────────────┼─────────────────────┐
                    │                                     │                     │
            Filter: State=Placed              Filter: State=Cancelled    No Filter
                    │                                     │                     │
                    ▼                                     ▼                     ▼
            ┌───────────────┐                     ┌───────────────┐     ┌───────────────┐
            │ SQS (Placed)  │                     │ SQS (Cancelled)│    │ SQS (All)     │
            └───────────────┘                     └───────────────┘     └───────────────┘

No filter policy = receives ALL messages

⚠️ Exam trap: “Route different message types to different queues” → SNS Filter Policy


Application: S3 Events to Multiple Queues

Problem: S3 allows only one event rule per combination of event type + prefix

Solution: S3 → SNS → Fan out to multiple SQS queues

S3 Events Fan Out:

┌───────────┐   events   ┌───────────┐   fan-out   ┌───────────┐
│ S3 Object │───────────▶│ SNS Topic │────────────▶│ SQS Queue │
│ Created   │            └─────┬─────┘             └───────────┘
└───────────┘                  │                   ┌───────────┐
                               ├──────────────────▶│ SQS Queue │
                               │                   └───────────┘
                               │                   ┌───────────┐
                               └──────────────────▶│ Lambda    │
                                                   └───────────┘

⚠️ Exam trap: “S3 event to multiple destinations” → S3 → SNS → Fan out


Application: SNS to S3 via Kinesis Data Firehose

SNS can send to Kinesis Data Firehose → then to any KDF destination

SNS → Kinesis Data Firehose → S3:

┌─────────┐         ┌───────────┐         ┌─────────────────┐         ┌────────┐
│ Buying  │────────▶│ SNS Topic │────────▶│ Kinesis Data    │────────▶│   S3   │
│ Service │         └───────────┘         │ Firehose        │         └────────┘
└─────────┘                               └─────────────────┘
                                          (or any KDF destination)

Amazon SNS – Security

Security LayerOptions
In-flight encryptionHTTPS API
At-rest encryptionKMS keys
Client-side encryptionCustomer manages encrypt/decrypt
Access ControlsIAM policies for SNS API access
SNS Access PoliciesResource-based (like S3 bucket policies)

SNS Access Policies use cases:

⚠️ Exam trap: “Allow S3 to publish to SNS” → SNS Access Policy (not IAM policy)


SQS vs SNS vs Kinesis

FeatureSQSSNSKinesis
ModelPull (consumers poll)Push (to subscribers)Pull (standard) / Push (enhanced fan-out)
Data persistenceDeleted after consumedNot persisted (lost if not delivered)Retained up to 365 days
Replay capabilityNoNoYes
Consumers/SubscribersUnlimited workers12.5M subscribers, 100K topics2 MB/shard (standard), 2 MB/shard/consumer (enhanced)
ThroughputNo provisioning neededNo provisioning neededProvisioned or On-demand
OrderingFIFO queues onlyFIFO topics (for SQS FIFO)Per shard (Partition ID)
DelayIndividual message delayNoNo
Use caseDecouple apps, bufferFan-out notificationsReal-time big data, analytics, ETL

Amazon Kinesis Data Streams

Collect and store streaming data in real-time

Kinesis Data Streams Flow:

┌─────────────────┐                                      ┌──────────────────┐
│ Click Streams   │                                      │ Application      │
│ IoT Devices     │──┐    ┌──────────────────────┐   ┌──▶│ Lambda           │
│ Metrics & Logs  │  │    │ Kinesis Data Streams │   │   │ Data Firehose    │
└─────────────────┘  │    │  ┌────┬────┬────┐    │   │   │ Apache Flink     │
                     ├───▶│  │Shard│Shard│Shard│  │───┘   └──────────────────┘
┌──────────────────┐ │    │  └────┴────┴────┘    │              Consumers
│ Producers:       │ │    └──────────────────────┘
│ - Applications   │─┘
│ - Kinesis Agent  │
└──────────────────┘

Key Features:

Libraries:


Kinesis Data Streams – Capacity Modes

ModeProvisioningThroughputScalingPricing
ProvisionedChoose # of shards1 MB/s in, 2 MB/s out per shardManualPer shard/hour
On-DemandAutomaticDefault 4 MB/s inAuto (based on last 30 days peak)Per stream/hour + data in/out

Switching modes: Console or CLI, no downtime, but limited to 2 switches per 24 hours

ProvisionedThroughputExceeded: Add more shards or switch to On-Demand mode

⚠️ Exam trap: “Unpredictable traffic spikes in Kinesis” → On-demand mode

⚠️ Exam trap: “ProvisionedThroughputExceeded in Kinesis” → Add shards or use On-Demand mode

⚠️ Exam trap: Why NOT “SQS as buffer to Kinesis”? Seems logical (SQS handles any spike, buffers for Kinesis). But adds latency (no longer real-time), complexity, and the bottleneck just MOVES to where SQS writes to Kinesis Solution: Scale Kinesis directly (add shards) — don’t work around it

⚠️ Exam trap: “Need to replay streaming data” → Kinesis Data Streams (not Firehose, not SQS)


Amazon Data Firehose

Load streaming data into destinations (fully managed, no code)

Data Firehose Flow:

┌─────────────────┐                                     ┌─────────────────────┐
│ Producers:      │                                     │ AWS Destinations:   │
│ - Kinesis Streams│     ┌─────────────────────┐        │ - S3               │
│ - CloudWatch    │     │                     │        │ - Redshift         │
│ - AWS IoT       │────▶│  Data Firehose      │───────▶│ - OpenSearch       │
│ - SNS           │     │  (batch writes)     │        ├─────────────────────┤
│ - SDK/Agent     │     │       │             │        │ 3rd Party:         │
└─────────────────┘     │       ▼             │        │ - Splunk, Datadog  │
                        │  Lambda Transform   │        │ - MongoDB, NewRelic│
    Record up to 1MB    └─────────────────────┘        ├─────────────────────┤
                              │                        │ Custom: HTTP endpoint│
                              ▼                        └─────────────────────┘
                        S3 Backup Bucket
                        (all or failed data)

Note: SQS is NOT a Firehose producer (SQS → Firehose requires Lambda in between)

Key Features:


Kinesis Data Streams vs Data Firehose

FeatureKinesis Data StreamsData Firehose
PurposeStreaming data collectionLoad data to destinations
ManagementProducer/Consumer code neededFully managed
LatencyReal-time (~200ms)Near real-time (buffering)
ScalingProvisioned / On-DemandAutomatic
Data StorageUp to 365 daysNo storage
Replay✅ Yes❌ No
DestinationsCustom consumersS3, Redshift, OpenSearch, 3rd party, HTTP
Data Transformation❌ No (raw data)✅ Yes (Lambda, format conversion)

⚠️ Exam trap: “Real-time streaming” → Kinesis Data Streams. “Near real-time” → Data Firehose

⚠️ Exam trap: “Transform data while streaming to S3” → Data Firehose (only service with built-in transformation)

⚠️ Exam trap: “Load streaming data directly to S3” → Data Firehose (not Kinesis Data Streams)


Amazon MQ

Managed message broker for RabbitMQ and ActiveMQ (migration path for on-prem apps)

When to use Amazon MQ vs SQS/SNS:

FeatureSQS/SNSAmazon MQ
ProtocolsAWS proprietaryMQTT, AMQP, STOMP, OpenWire, WSS
ScalingServerless, unlimitedRuns on servers, limited scaling
Use caseNew cloud-native appsMigrate existing on-prem apps
FeaturesQueue (SQS) OR Topic (SNS)Both queue AND topic features

Amazon MQ Supported Protocols:

⚠️ Exam trap: “Migrate on-prem app using MQTT/AMQP/STOMP” → Amazon MQ (SQS/SNS don’t support these protocols)

Amazon MQ High Availability (Multi-AZ):

                          Region (us-east-1)
                    ┌─────────────────────────────────┐
                    │      AZ (us-east-1a)            │
                    │    ┌─────────────────┐          │
           ┌───────▶│    │ ACTIVE Broker   │◀────┐    │
           │        │    └─────────────────┘     │    │
           │        ├────────────────────────────┼────┤
┌────────┐ │        │      AZ (us-east-1b)       │    │
│ Client │─┤        │    ┌─────────────────┐     │    │    ┌─────────┐
└────────┘ │        │    │ STANDBY Broker  │◀────┼────┼───▶│ Amazon  │
           │        │    └─────────────────┘     │    │    │ EFS     │
           └───────▶│         (failover)         │    │    │(storage)│
                    └─────────────────────────────────┘    └─────────┘

High Availability:

⚠️ Exam trap: “Migrate on-prem RabbitMQ/ActiveMQ to AWS” → Amazon MQ (not SQS/SNS)

⚠️ Exam trap: “SNS FIFO topic subscribers” → SQS queues only (Standard or FIFO)



🎯 MASTER SUMMARY: Messaging Services Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: Decoupling = Scaling Independence

The entire point of messaging services is breaking tight coupling. When you see:

SQS is the universal buffer. Infinite throughput, never loses messages. When anything is overwhelmed (database, API, service), put SQS in front.

Principle 2: Push vs Pull = SNS vs SQS

Two fundamental patterns:

Fan Out = SNS + SQS combined. SNS pushes to multiple SQS queues. Each queue processes independently. Best of both worlds.

Principle 3: Persistence = Can You Replay?

ServiceStores Data?Replay?Retention
SQSUntil consumed❌ No4 days default, 14 max
SNSNever❌ NoNone (deliver or lose)
Kinesis StreamsUp to 365 days✅ Yes1 day default, 365 max
FirehoseNever❌ NoNone (pass-through)

If they need to reprocess old data → Kinesis Data Streams is the ONLY option.

Principle 4: Real-time vs Near Real-time

Firehose is “lazy Kinesis” — easier but slower. If latency matters, use Streams.

Principle 5: FIFO = Ordering + Exactly-Once

Standard queues/topics are fast but messy (duplicates possible, order not guaranteed).

FIFO queues/topics trade throughput (300 msg/s) for guarantees:

Principle 6: Access Control = Who’s Calling?

Cross-account? Other AWS service (S3, SNS)? → Resource-based policy.

Principle 7: Protocol = Cloud-Native vs Legacy

SQS/SNS = AWS proprietary SDKs. Great for new apps. Amazon MQ = Open protocols (MQTT, AMQP, STOMP). For migrating existing apps.

Keyword “migrate” + “existing broker” + “no code changes” = Amazon MQ

Principle 8: Scaling Strategy

ServiceHow to Scale
SQSAutomatic (just add consumers)
SNSAutomatic
KinesisAdd shards (Provisioned) or use On-Demand
FirehoseAutomatic

ProvisionedThroughputExceeded = add shards or switch to On-Demand.


Part 2: Decision Tree (Follow Keywords → Find Answer)

Step 1: What’s the communication pattern?

                        What's the pattern?
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
   One-to-One            One-to-Many         Continuous Stream
        │                     │                     │
        ▼                     ▼                     ▼
      SQS                   SNS              Need replay?
                       (or Fan Out)               │
                                          ┌───────┴───────┐
                                          ▼               ▼
                                         Yes              No
                                          │               │
                                          ▼               ▼
                                    Kinesis DS      Firehose

Step 2: Feature-Based Decision Table

If question mentions…Answer is…
“ordering” or “sequence”SQS/SNS FIFO
“exactly-once” or “no duplicates”FIFO + Deduplication ID
“replay” or “reprocess”Kinesis Data Streams
“transform while streaming”Data Firehose + Lambda
“load directly to S3/Redshift”Data Firehose
“multiple destinations from one event”SNS Fan Out
“filter messages per subscriber”SNS Filter Policy
“MQTT/AMQP/STOMP protocol”Amazon MQ
“cross-account access”Resource-based policy (SQS/SNS Access Policy)
“reduce costs” + SQSLong Polling
“database overwhelmed”SQS as buffer
“unpredictable traffic” + KinesisOn-Demand mode
“ProvisionedThroughputExceeded”Add shards or On-Demand
“consumer needs more time”ChangeMessageVisibility API
“scale based on queue depth”CloudWatch Alarm on ApproximateNumberOfMessages

The “NOT” Rules (Eliminate Wrong Answers Fast)

StatementWhy It’s Wrong
SNS for replaySNS does NOT persist messages
SQS pushes to consumersSQS is pull-only (consumers poll)
Kinesis Streams transforms dataStreams = raw data only, Firehose transforms
Firehose for replayFirehose does NOT store (pass-through)
SQS/SNS with MQTTUse Amazon MQ for MQTT/AMQP/STOMP
Lambda subscribes to SNS FIFOSNS FIFO → SQS queues ONLY
SQS as Firehose producerSQS needs Lambda to feed Firehose
Kinesis Streams loads to S3 directlyUse Firehose for S3/Redshift loading

Part 3: Scenario Pattern Recognition

Pattern: “Decouple / Buffer / Protect Backend”

Keywords: overwhelmed, spikes, lost transactions, protect database, decouple

Answer: SQS as buffer

requests ──▶ [Front-end ASG] ──▶ [SQS Queue] ──▶ [Back-end ASG] ──▶ Database
                                 (absorbs spike)   (processes safely)

Pattern: “One Event → Multiple Actions”

Keywords: notify multiple services, fan out, broadcast, S3 event to multiple queues

Answer: SNS (or SNS + SQS Fan Out)

[Producer] ──▶ [SNS Topic] ──┬──▶ [SQS Queue 1] ──▶ Service A
                             ├──▶ [SQS Queue 2] ──▶ Service B
                             └──▶ [Lambda]      ──▶ Service C

Pattern: “Need to Replay / Reprocess Data”

Keywords: replay, reprocess, audit trail, re-analyze, multiple consumers read same data

Answer: Kinesis Data Streams

Why: Only service that stores data (up to 365 days) and allows multiple reads.


Pattern: “Stream Data to S3/Redshift/OpenSearch”

Keywords: load streaming data, store in S3, analytics destination, transform while streaming

Answer: Data Firehose

[Any Source] ──▶ [Firehose] ──▶ (optional Lambda) ──▶ S3/Redshift/OpenSearch

Pattern: “Real-time Processing Required”

Keywords: real-time, sub-second, immediate, ~200ms, IoT, clickstream

Answer: Kinesis Data Streams (NOT Firehose — it buffers)


Pattern: “Order Matters / No Duplicates”

Keywords: ordering, sequence, exactly-once, financial transactions, no duplicates

Answer: FIFO queue/topic

Remember: Queue name ends with .fifo. Throughput = 300-3000 msg/s.


Pattern: “Migrate Existing Message Broker”

Keywords: migrate, existing application, RabbitMQ, ActiveMQ, MQTT, AMQP, no code changes

Answer: Amazon MQ

Why: Supports open protocols. SQS/SNS require AWS SDK = code changes.


Pattern: “Reduce Costs / Empty Responses”

Keywords: reduce API calls, empty responses, cost optimization, SQS

Answer: Long Polling (set WaitTimeSeconds up to 20 sec)


Pattern: “Consumer Processing Takes Too Long”

Keywords: timeout, visibility, need more time, duplicate processing

Answer: Increase Visibility Timeout or call ChangeMessageVisibility API


Pattern: “Unpredictable Traffic in Kinesis”

Keywords: unpredictable, variable load, spikes, promotional campaign

Answer: On-Demand mode (auto-scales based on last 30 days peak)


Pattern: “ProvisionedThroughputExceeded Error”

Keywords: throughput exceeded, throttling, Kinesis errors

Answer: Add more shards OR switch to On-Demand


Pattern: “Cross-Account / Allow Other Service”

Keywords: cross-account, allow S3 to write, allow SNS to write

Answer: Resource-based policy (SQS/SNS Access Policy)


Pattern: “Route Different Message Types”

Keywords: filter, route by attribute, different processing per type

Answer: SNS Filter Policy (JSON policy per subscription)


Pattern: “S3 Event to Multiple Destinations”

Keywords: S3 notification, multiple queues, multiple Lambda

Answer: S3 → SNS → Fan Out (S3 allows only one rule per event+prefix combo)


Part 4: Quick Reference Tables

Service Comparison At-a-Glance

FeatureSQSSNSKinesis StreamsFirehoseAmazon MQ
ModelPullPushPull/PushPushPull/Push
ThroughputUnlimitedUnlimitedPer shardAutoLimited
OrderingFIFO onlyFIFO onlyPer shardNoYes
PersistenceUntil consumedNoUp to 365 daysNoYes
Replay
Transform
ProtocolsAWS SDKAWS SDKAWS SDKAWS SDKMQTT/AMQP/STOMP

Throughput Numbers

ServiceThroughput
SQS StandardUnlimited
SQS FIFO300 msg/s (3000 batched)
SNS StandardUnlimited (12.5M subscribers/topic)
SNS FIFO300 msg/s (3000 batched)
Kinesis Provisioned1 MB/s in, 2 MB/s out per shard
Kinesis On-DemandAuto (default 4 MB/s in)
Data FirehoseAuto-scales

Message Size Limits

ServiceMax Message/Record Size
SQS256 KB
SNS256 KB
Kinesis1 MB
Firehose1 MB

Key APIs to Remember

APIServicePurpose
SendMessageSQSSend message to queue
ReceiveMessageSQSPoll messages (up to 10)
DeleteMessageSQSRemove processed message
ChangeMessageVisibilitySQSExtend processing time
PublishSNSSend to topic
PutRecord / PutRecordsKinesisSend to stream

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“replay” / “reprocess”Kinesis Data Streams
“fan out” / “multiple destinations”SNS (+ SQS for persistence)
“buffer” / “overwhelmed” / “protect DB”SQS
“MQTT” / “AMQP” / “migrate broker”Amazon MQ
“ordering” / “sequence” / “exactly-once”FIFO
“transform while streaming”Data Firehose
“load to S3” (from stream)Data Firehose
“real-time” + streamingKinesis Data Streams
“near real-time” + streamingData Firehose
“reduce SQS costs”Long Polling
“empty responses”Long Polling
“unpredictable Kinesis traffic”On-Demand mode
“ProvisionedThroughputExceeded”Add shards / On-Demand
“cross-account” / “allow S3/SNS”Resource-based policy
“filter per subscriber”SNS Filter Policy
“S3 event multiple destinations”S3 → SNS → Fan Out
“consumer needs more time”ChangeMessageVisibility
“scale on queue depth”CloudWatch ApproximateNumberOfMessages
“no code changes” + “migrate”Amazon MQ
“RabbitMQ/ActiveMQ to AWS”Amazon MQ

Part 6: Elimination Checklist

When stuck between options, eliminate systematically:

□ Do they need REPLAY?
  → No = eliminate Kinesis Data Streams
  → Yes = Kinesis Data Streams is likely answer

□ Do they need PUSH to multiple?
  → No = eliminate SNS
  → Yes = SNS or Fan Out pattern

□ Do they need ORDERING?
  → No = eliminate FIFO options
  → Yes = must be FIFO

□ Do they need REAL-TIME?
  → No = Firehose acceptable
  → Yes = must be Kinesis Data Streams

□ Do they mention OPEN PROTOCOLS (MQTT/AMQP)?
  → No = eliminate Amazon MQ
  → Yes = Amazon MQ is likely answer

□ Do they need DATA TRANSFORMATION?
  → No = Kinesis Streams acceptable
  → Yes = must be Firehose (with Lambda)

□ Is it CROSS-ACCOUNT or OTHER SERVICE access?
  → No = IAM policy
  → Yes = Resource-based policy

🏆 The Golden Rules

  1. Replay = Kinesis Data Streams (only option)
  2. Fan Out = SNS (optionally + SQS)
  3. Buffer = SQS (infinite, never loses)
  4. Open Protocols = Amazon MQ (migrate without code changes)
  5. Order/Exactly-Once = FIFO (trade throughput for guarantees)
  6. Stream to S3 = Firehose (not Kinesis Streams directly)
  7. Transform = Firehose (only streaming service with built-in transform)
  8. Real-time = Kinesis Streams (Firehose buffers = near real-time)
  9. Other Account/Service = Resource Policy (not IAM)
  10. Reduce SQS cost = Long Polling (always)

Solution Architecture

Stateless Web App Evolution: WhatIsTheTime.com A simple app that returns current time — no database needed.

Growth Steps:

StepArchitectureProblem SolvedNew Problem
1EC2 + Public IPWorks!IP changes on restart
2EC2 + Elastic IPStatic IPSingle point of failure, no scaling
3EC2 + Route 53 (A record)DNS-based, no Elastic IP neededStill single instance
4ELB + multiple EC2Horizontal scaling, health checksManual instance management
5ELB + ASGAuto-scaling, self-healingSingle AZ failure risk
6ELB + ASG + Multi-AZHigh availability across AZs✅ Production ready!
       ┌──────────┐
       │ Route 53 │ Alias Record
       │   DNS    │ api.whatisthetime.com
       └────┬─────┘
            │
            ▼
       ┌─────────┐      AZ 1-3
       │   ELB   │◄─── Health Checks
       │Multi-AZ │     + Multi-AZ
       └────┬────┘
            │
    ┌───────┼───────┐
    ▼       ▼       ▼
  ┌───┐   ┌───┐   ┌───┐
  │M5 │   │M5 │   │M5 │  ◄── Auto Scaling Group
  │AZ1│   │AZ2│   │AZ3│      (spans 3 AZs)
  └───┘   └───┘   └───┘

Key Concepts Covered:

⚠️ Exam trap - Cost Optimization with ASG:

Stateful Web App Evolution: MyClothes.com (Session State) E-commerce app with shopping cart — needs to maintain user state across requests.

The Problem: With multiple EC2 instances behind ELB, user may hit different server each request → loses shopping cart!

Growth Steps:

StepSolutionHow It WorksTrade-off
1ELB Sticky SessionsCookie ties user to same EC2Instance failure = lost cart
2User CookiesStore cart in browser cookieLimited size, security risk
3ElastiCache (Sessions)Store session in Redis/MemcachedSub-ms latency, shared state
4DynamoDB (Sessions)Alternative to ElastiCacheServerless, auto-scaling
5RDS (User Data)Persist user details, addressesNeed read replicas for scale
6ElastiCache (Caching)Cache RDS queriesReduce DB load
7Multi-AZ EverythingRDS + ElastiCache Multi-AZ✅ Production ready!
                        ┌──────────┐
                        │ Route 53 │
                        └────┬─────┘
                             │
    ┌────────────────────────┴──────────────────────┐
    │                    Multi-AZ                   │
    │  ┌─────────┐                                  │
    │  │   ELB   │◄── Open HTTP/HTTPS to 0.0.0.0/0  │
    │  └────┬────┘                                  │
    │       │ Restrict to ELB SG only               │
    │  ┌────┴────┬─────────┐     Auto Scaling Group │
    │  ▼         ▼         ▼                        │
    │┌────┐    ┌────┐    ┌────┐                     │
    ││ M5 │    │ M5 │    │ M5 │  AZ1, AZ2, AZ3      │
    │└──┬─┘    └─┬──┘    └──┬─┘                     │
    └──┼─────────┼──────────┼───────────────────────┘
       │         │          │
       │  Restrict to EC2 SG only
       ▼         ▼          ▼
  ┌─────────┐        ┌─────────┐
  │Elasti-  │        │   RDS   │
  │Cache    │        │Multi-AZ │
  │(sessions│        │+Replicas│
  │+caching)│        └─────────┘
  └─────────┘

3-Tier Security (SG Chaining):

LayerSecurity Group Rule
ELBInbound: HTTP/HTTPS from 0.0.0.0/0
EC2Inbound: Only from ELB SG
RDS/ElastiCacheInbound: Only from EC2 SG

Key Concepts:

⚠️ Exam trap - Stateless Session Storage:

StorageStateless?Why
ElastiCache✅ YesShared across all EC2s
RDS/DynamoDB✅ YesShared across all EC2s
HTTP Cookies✅ YesClient carries state
EBS❌ NoSingle AZ, single EC2 only

EBS makes app stateful — user hitting different EC2 loses session!


Typical 3-Tier Web App Architecture Reference diagram showing production-ready AWS web app with all components:

                              ┌──────────┐
                              │ Route 53 │
                              └────┬─────┘
                                   │
┌──────────────────────────────────┴──────────────────────────────────────┐
│ PUBLIC SUBNET                                                           │
│    ┌─────────────────────────────────────────────────────────────────┐  │
│    │                     ELB (Multi-AZ)                              │  │
│    │               ◄─ Open HTTP/HTTPS to 0.0.0.0/0                   │  │
│    └─────────────────────────┬───────────────────────────────────────┘  │
└──────────────────────────────┼──────────────────────────────────────────┘
                               │
┌──────────────────────────────┴──────────────────────────────────────────┐
│ PRIVATE SUBNET            Auto Scaling Group                            │
│         ┌─────────────┬─────────────┬─────────────┐                     │
│         │   ┌─────┐   │   ┌─────┐   │   ┌─────┐   │                     │
│         │   │ M5  │   │   │ M5  │   │   │ M5  │   │                     │
│         │   │ AZ1 │   │   │ AZ2 │   │   │ AZ3 │   │                     │
│         │   └──┬──┘   │   └──┬──┘   │   └──┬──┘   │                     │
│         └─────────────┴─────────────┴─────────────┘                     │
└─────────────────┼───────────┼───────────┼───────────────────────────────┘
                  │           │           │
┌─────────────────┴───────────┴───────────┴───────────────────────────────┐
│ DATA SUBNET                                                             │
│    ┌─────────────────────┐        ┌─────────────────────┐               │
│    │     ElastiCache     │        │      Amazon RDS     │               │
│    │   ─────────────     │        │   ─────────────     │               │
│    │  Session storage    │        │  Read/write data    │               │
│    │  + Query cache      │        │  (Multi-AZ)         │               │
│    └─────────────────────┘        └─────────────────────┘               │
└─────────────────────────────────────────────────────────────────────────┘
SubnetContainsAccess
PublicELBOpen to internet (0.0.0.0/0)
PrivateEC2 (ASG)Only from ELB SG
DataRDS, ElastiCacheOnly from EC2 SG

Stateful Web App Evolution: MyWordPress.com (Shared File Storage) Scalable WordPress with image uploads and MySQL database.

The Problem: Images uploaded to one EC2 won’t be visible from other EC2 instances!

Growth Steps:

StepSolutionProblem SolvedLimitation
1Single EC2 + EBSSimple, worksSingle AZ, no scaling
2Multi EC2 + EBS eachScalingImages not shared across instances!
3Multi EC2 + EFSShared storage across AZs✅ All instances see all images
4Aurora MySQLMulti-AZ + Read Replicas built-in✅ Production ready!

EBS vs EFS for Distributed Apps:

StorageScopeUse Case
EBSSingle EC2 in single AZSingle instance apps
EFSShared across EC2s + AZsDistributed apps (WordPress, CMS)
       ┌──────────┐
       │ Route 53 │
       └────┬─────┘
            │
       ┌────┴────┐
       │   ELB   │ Multi-AZ
       └────┬────┘
            │
    ┌───────┴───────┐
    ▼               ▼
┌────────┐     ┌────────┐
│   M5   │     │   M5   │
│  AZ 1  │     │  AZ 2  │
└───┬────┘     └───┬────┘
    │   ENI        │   ENI
    │              │
    └──────┬───────┘
           │
           ▼
       ┌───────┐
       │  EFS  │ ◄── Shared storage
       │       │     (images visible
       └───────┘      from all EC2s)

Key Concepts:

⚠️ Exam trap: “Shared file storage across multiple EC2 instances” → EFS (not EBS!)

⚠️ Exam trap - Software updates on 100s of EC2s:

Instantiating Applications Quickly Launching a full stack (EC2, EBS, RDS) can be slow — install apps, configure, insert data. Use these strategies to speed up:

Golden AMI = AMI standardized through configuration, consistent security patching, and hardening. Contains pre-approved agents for logging, security, and performance monitoring. In Beanstalk, you can specify a custom AMI instead of the standard platform AMI to improve provisioning times.

ResourceFast Launch StrategyWhat It Does
EC2Golden AMIPre-baked image with OS, apps, dependencies
EC2User DataBootstrap script for dynamic config at launch
EC2HybridGolden AMI + User Data (Elastic Beanstalk approach)
RDSRestore from SnapshotDB with schemas + data ready instantly
EBSRestore from SnapshotPre-formatted disk with data

Golden AMI vs User Data:

ApproachSpeedFlexibilityUse Case
Golden AMI⚡ FastestLow (requires rebuild)Stable configs, rarely change
User DataSlowerHigh (scripts)Dynamic config, secrets
HybridBalancedMediumBest of both worlds

⚠️ Exam trap - “Speed up EC2 launch / scale-out”:

⚠️ Exam trap — EC2 User Data facts:

⚠️ Exam trap - “Static + dynamic installation, reduce boot time”:


Elastic Beanstalk

Developer-centric view of deploying apps on AWS — just upload code, Beanstalk handles the rest.

FeatureDetails
What it managesEC2, ASG, ELB, RDS, CloudWatch, etc.
Your responsibilityApplication code only
ControlFull control over configuration if needed
CostFree (pay only for underlying resources)

Workflow:

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Create    │───→│   Upload    │───→│   Launch    │───→│   Manage    │
│ Application │    │   Version   │    │ Environment │    │ Environment │
└─────────────┘    └──────┬──────┘    └─────────────┘    └──────┬──────┘
                          │                                     │
                          │◄────── deploy new version ──────────┘
                          │
                          └──────── update version ─────────────→

Components:

ComponentDescription
ApplicationContainer for environments, versions, configs
Application VersionIteration of your code (stored in S3)
EnvironmentAWS resources running ONE version at a time
Environment TierWeb Server or Worker

Environment Tiers:

TierUse CaseComponents
Web ServerHTTP requestsELB + ASG + EC2
WorkerBackground tasksSQS + ASG + EC2
┌─────────────────────────────────────┐   ┌─────────────────────────────────────┐
│         Web Server Tier             │   │           Worker Tier               │
│  (myapp.us-east-1.elasticbeanstalk) │   │                                     │
├─────────────────────────────────────┤   ├─────────────────────────────────────┤
│              ┌─────┐                │   │            ┌───────────┐            │
│              │ ELB │                │   │            │ SQS Queue │            │
│              └──┬──┘                │   │            └─────┬─────┘            │
│                 │                   │   │          pull messages              │
│     ┌───────────┴───────────┐       │   │       ┌───────────┴───────────┐     │
│     ▼                       ▼       │   │       ▼                       ▼     │
│ ┌────────┐ ASG        ┌────────┐    │   │   ┌────────┐ ASG        ┌────────┐  │
│ │  EC2   │            │  EC2   │    │   │   │  EC2   │            │  EC2   │  │
│ │(WebSrv)│            │(WebSrv)│    │   │   │(Worker)│            │(Worker)│  │
│ └────────┘            └────────┘    │   │   └────────┘            └────────┘  │
│   AZ 1                  AZ 2        │   │     AZ 1                  AZ 2      │
└─────────────────────────────────────┘   └─────────────────────────────────────┘

Worker Tier Details:

Deployment Modes:

ModeComponentsUse Case
Single InstanceElastic IP + EC2 + RDSDev/test
High AvailabilityALB + ASG + Multi-AZ RDSProduction
┌─────────────────────────┐   ┌─────────────────────────────────────────────┐
│    Single Instance      │   │     High Availability with Load Balancer   │
│    (Great for dev)      │   │     (Great for prod)                       │
├─────────────────────────┤   ├─────────────────────────────────────────────┤
│      Elastic IP         │   │                  ┌─────┐                   │
│          │              │   │                  │ ALB │                   │
│          ▼              │   │                  └──┬──┘                   │
│     ┌────────┐          │   │        ┌───────────┴───────────┐           │
│     │  EC2   │          │   │        ▼                       ▼           │
│     └────────┘          │   │   ┌────────┐  ASG         ┌────────┐       │
│          │              │   │   │  EC2   │              │  EC2   │       │
│          ▼              │   │   └────────┘              └────────┘       │
│     ┌────────┐          │   │      AZ 1                   AZ 2           │
│     │  RDS   │          │   │        │                       │           │
│     │ Master │          │   │        ▼                       ▼           │
│     └────────┘          │   │   ┌────────┐              ┌────────┐       │
│       AZ 1              │   │   │  RDS   │              │  RDS   │       │
│                         │   │   │ Master │              │Standby │       │
└─────────────────────────┘   │   └────────┘              └────────┘       │
                              │      AZ 1                   AZ 2           │
                              └─────────────────────────────────────────────┘

Supported Platforms: Go, Java SE, Java/Tomcat, .NET Core/Linux, .NET/Windows, Node.js, PHP, Python, Ruby, Docker (Single/Multi-container), Packer Builder

⚠️ Exam trap: Beanstalk is free — you pay for EC2, RDS, ELB, etc. that it provisions!

⚠️ Exam trap - Slow Beanstalk deployments:

Deployment Strategies:

StrategyDowntimeDeploy TimeRollbackUse Case
All-at-onceYes ⚠️⚡ FastestRedeployDev/test
RollingNoSlowRedeployProd, cost-conscious
Rolling with batchNoSlowerRedeployProd, maintain capacity
ImmutableNoSlowestTerminate new ASGProd, safest
Blue/GreenNoFastSwap URLProd, instant rollback

Deployment Details:

StrategyHow It Works
All-at-onceDeploy to all at same time — brief outage
RollingDeploy to batches, old instances serve while updating
Rolling with batchLike rolling, but spins up NEW instances first (maintains capacity)
ImmutableNew ASG with new instances → swap → terminate old ASG
Blue/GreenNew environment → Route 53/ELB swap → terminate old env

⚠️ Exam trap - Deployment strategies:

.ebextensions:

Saved Configurations:

Monitoring:

Amazon CloudWatch is a service that monitors applications, responds to performance changes, optimizes resource use, and provides insights into operational health. By collecting data across AWS resources, CloudWatch gives visibility into system-wide performance and allows users to set alarms, automatically react to changes, and gain a unified view of operational health.

Important metrics:

Amazon CloudWatch Alarms are used to trigger notifications for any metric.

Amazon CloudWatch Logs:

Amazon EventBridge is a serverless event bus that ingests data from your own apps, SaaS apps, and AWS services and routes that data to targets.

Amazon CloudTrail is an AWS service that helps you enable operational and risk auditing, governance, and compliance of your AWS account. Actions taken by a user, role, or an AWS service are recorded as events in CloudTrail. Get an history of events / API calls made in the AWS Management Console, AWS Command Line Interface, and AWS SDKs and APIs. Can put logs from CloudTrail into CloudWatch Logs or S3. Audit of all users’ events and activities.

AWS X-Ray provides a complete view of requests as they travel through your application and filters visual data across payloads, functions, traces, services, APIs, and more with no-code and low-code motions.

Amazon CodeGuru is a static application security testing (SAST) tool that combines machine learning (ML) and automated reasoning to identify vulnerabilities in your code, provide recommendations on how to fix the identified vulnerabilities, and track the status of the vulnerabilities until closure.

AWS Health Dashboard view the overall status and health of AWS services. AWS Health Dashboard - Your Account provides alerts and remediation guidance when AWS is experiencing events that may impact you.


CloudWatch Metrics


CloudWatch Logs

Log Structure:

Log Sources:

Log Destinations:

Encryption:


CloudWatch Logs Insights


CloudWatch Logs — S3 Export vs Subscriptions

MethodLatencyUse Case
S3 ExportUp to 12 hoursBatch archive, compliance
SubscriptionsReal-time / Near real-timeLive processing, streaming

S3 Export:


CloudWatch Logs Subscriptions

Get real-time log events from CloudWatch Logs for processing and analysis.

Subscription Filter = filter which logs are delivered to destination

CloudWatch Logs ──► Subscription Filter ──┬──► Lambda (real-time) ──► OpenSearch
                                          │
                                          ├──► Kinesis Firehose (near real-time) ──► S3
                                          │
                                          └──► Kinesis Data Streams ──► KDF/KDA/EC2/Lambda

Subscription Destinations:

DestinationLatencyUse Case
LambdaReal-timeTransform, send to OpenSearch
Kinesis FirehoseNear real-timeDeliver to S3, Redshift, OpenSearch
Kinesis Data StreamsReal-timeCustom consumers, analytics

CloudWatch Logs Aggregation (Multi-Account & Multi-Region)

Aggregate logs from multiple accounts and regions into a central location:

Account A / Region 1 ──► Subscription Filter ──┐
                                               │
Account B / Region 2 ──► Subscription Filter ──┼──► Kinesis Data Streams ──► Firehose ──► S3
                                               │                            (near real-time)
Account B / Region 3 ──► Subscription Filter ──┘

⚠️ Exam trap: “Aggregate logs from multiple accounts/regions” → Subscription Filters to central Kinesis Data Streams, then Firehose to S3.


CloudWatch Metric Streams

Continually stream CloudWatch metrics to a destination with near-real-time delivery.

CloudWatch Metrics ──► Kinesis Data Firehose ──┬──► S3 ──► Athena
                      (near real-time)         │
                                               ├──► Redshift
                                               │
                                               └──► OpenSearch

Destinations:

Features:

⚠️ Exam trap: “Stream metrics to S3/Redshift/3rd party” → CloudWatch Metric Streams via Firehose.


CloudWatch Agent for EC2

By default, NO logs from EC2 go to CloudWatch!

Agent Types:

AgentMetricsLogsNotes
CloudWatch Logs AgentOld version, logs only
CloudWatch Unified AgentRecommended, more metrics

CloudWatch Unified Agent:

Unified Agent Metrics (Linux/EC2):

CategoryMetrics
CPUactive, guest, idle, system, user, steal
Diskfree, used, total
Disk IOwrites, reads, bytes, iops
RAMfree, inactive, used, total, cached
NetstatTCP/UDP connections, net packets, bytes
Processestotal, dead, blocked, idle, running, sleep
Swapfree, used, used %

⚠️ Exam trap: “Monitor RAM on EC2” or “EC2 memory usage” → CloudWatch Unified Agent required! RAM is NOT a default EC2 metric.

⚠️ Exam trap: Default EC2 metrics = CPU, Disk, Network (high-level). For RAM, processes, detailed disk IO → Unified Agent.

CloudWatch Alarms

Alarm States:

StateMeaning
OKMetric within threshold
ALARMMetric breached threshold
INSUFFICIENT_DATANot enough data yet

Period:

Alarm Targets:

TargetAction
EC2Stop, Terminate, Reboot, or Recover
Auto ScalingTrigger scaling action (scale out/in)
SNSSend notification (then trigger Lambda, etc.)

Composite Alarms:


CloudWatch Alarms from Logs (Metric Filters)

Create alarms based on CloudWatch Logs using Metric Filters:

CW Logs ──► Metric Filter ──► CW Metric ──► CW Alarm ──► SNS (alert)
            (pattern match)   (count)       (threshold)

How it works:

  1. Logs arrive in CloudWatch Logs (e.g., RDS, Lambda, application logs)
  2. Metric Filter scans for pattern (e.g., “Error”, “Exception”)
  3. Each match increments a custom metric (count)
  4. Alarm monitors metric → triggers when threshold exceeded

Example: RDS Error Alerting

RDS DB Logs ──► CloudWatch Logs ──► Metric Filter ──► Metric ──► Alarm ──► SNS
                                    ("Error")         (count)    (>0)

⚠️ Exam trap: “Alert on keyword in logs” (Error, Exception, etc.) → Metric Filter + Alarm.

⚠️ Exam trap: Don’t use Lambda polling (expensive, not real-time). Don’t use Config (monitors resource config, not log content).


EC2 Instance Recovery

Status Checks:

CheckWhat it monitors
Instance statusEC2 VM (software)
System statusUnderlying hardware
Attached EBS statusEBS volumes

Recovery with CloudWatch Alarm:

EC2 Instance ◄── monitor ── CloudWatch Alarm ──► alert ──► SNS Topic
      │                    (StatusCheckFailed_System)
      │
      └── EC2 Instance Recovery

What’s preserved after recovery:

⚠️ Exam trap: “Auto-recover EC2 on hardware failure” → CloudWatch Alarm on StatusCheckFailed_System → EC2 Recovery action.

⚠️ Exam trap: “Most cost-optimal way to auto-reboot/stop/recover EC2” → CloudWatch Alarm → EC2 Action (direct). NOT CW Alarm → SNS → Lambda → EC2 API (over-engineered, 3 services). NOT EventBridge → Lambda (unnecessary compute). CW Alarms have built-in EC2 actions (Stop, Terminate, Reboot, Recover) — no Lambda needed.


Testing CloudWatch Alarms

Test alarms manually using CLI:

aws cloudwatch set-alarm-state \
  --alarm-name "myalarm" \
  --state-value ALARM \
  --state-reason "testing purposes"

CloudWatch Network Synthetic Monitor

Monitor network issues between AWS and on-premises data center.

AWS Cloud
┌────────────────────────────┐
│  ┌──────────────────────┐  │
│  │   Private Subnet     │  │
│  │   ┌────────────┐     │  │
│  │   │ EC2 Instance│     │  │
│  │   └────────────┘     │  │
│  └──────────────────────┘  │
│                            │
│  CloudWatch Metrics ◄──────┼──── DX Connection ──┬──► Corporate Data Center
│                            │         or          │         │
└────────────────────────────┘    VPN Connection   │      Server
                                                   │

Features:

⚠️ Exam trap: “Monitor network connectivity to on-premises” or “detect packet loss/latency over DX/VPN” → CloudWatch Network Synthetic Monitor.


CloudWatch Insights (4 Types)

Insight TypeTargetUse Case
Container InsightsECS, EKS, K8s on EC2, FargateMetrics + logs from containers
Lambda InsightsLambda functionsCold starts, memory, CPU, shutdowns
Contributor InsightsCloudWatch LogsFind top-N talkers, bad hosts, heavy users
Application InsightsEC2 apps (Java, .NET, IIS)Auto-dashboard for app troubleshooting

CloudWatch Container Insights


CloudWatch Lambda Insights


CloudWatch Contributor Insights


CloudWatch Application Insights

⚠️ Exam trap: “Find top talkers” or “heaviest network users from logs” → Contributor Insights.

⚠️ Exam trap: “Monitor Lambda cold starts” or “Lambda memory/CPU” → Lambda Insights (Lambda Layer).

⚠️ Exam trap: “Auto-dashboard for .NET/Java app issues” → Application Insights (SageMaker-powered).


AWS X-Ray

Distributed tracing for analyzing and debugging applications.

Client ──► API Gateway ──► Lambda ──► DynamoDB
              │              │            │
              └──────────────┴────────────┘
                    X-Ray collects traces
                           │
                           ▼
                    ┌─────────────┐
                    │ Service Map │ ◄── Visual representation
                    │  (latency,  │     of request flow
                    │   errors)   │
                    └─────────────┘

Key Concepts:

ConceptDescription
SegmentsData about work done by a service
SubsegmentsMore granular timing (e.g., DB calls)
TraceEnd-to-end path of a request
AnnotationsKey-value pairs for filtering traces (indexed)
MetadataKey-value pairs for additional data (NOT indexed)

Sampling Rules:

X-Ray Daemon:

Integrations:

ServiceHow to Enable
LambdaEnable “Active Tracing” in config
API GatewayEnable X-Ray in stage settings
ECS/EKSRun X-Ray daemon as sidecar
Elastic Beanstalk.ebextensions config
EC2Install and run X-Ray daemon
ELBAutomatically adds trace header

X-Ray APIs:

APIPurpose
PutTraceSegmentsUpload segment documents
PutTelemetryRecordsUpload telemetry
GetSamplingRulesRetrieve sampling rules
GetSamplingTargetsGet sampling decisions
GetServiceGraphGet visual service map
GetTraceSummariesGet trace IDs and annotations
BatchGetTracesGet full traces by ID

⚠️ Exam trap: “Debug microservices” or “trace request across services” → X-Ray.

⚠️ Exam trap: “Filter traces by custom attribute” → Use Annotations (indexed), NOT Metadata.

⚠️ Exam trap: X-Ray daemon listens on UDP 2000 — ensure Security Group allows it.


CloudWatch Synthetics Canaries

Configurable scripts that monitor endpoints and APIs.

CloudWatch Synthetics
        │
        ▼
┌───────────────┐     ┌──────────────┐     ┌─────────────┐
│    Canary     │────►│  Endpoint/   │────►│  CloudWatch │
│  (scheduled)  │     │    API       │     │   Metrics   │
└───────────────┘     └──────────────┘     └─────────────┘
        │                                         │
        ▼                                         ▼
  S3 (screenshots,                          CW Alarms
   HAR files)                              (alert on failure)

Key Features:

Canary Blueprints:

BlueprintUse Case
Heartbeat MonitorLoad URL, store screenshot, check availability
API CanaryTest REST APIs (GET, POST, etc.)
Broken Link CheckerCheck all links on a page
Visual MonitoringCompare screenshots against baseline
Canary RecorderRecord actions in Chrome, generate script
GUI Workflow BuilderTest multi-step workflows (login, checkout)

Schedule: Run once or on schedule (rate or cron expression)

⚠️ Exam trap: “Monitor website availability” or “test API endpoint regularly” → Synthetics Canaries.

⚠️ Exam trap: Canaries are NOT for load testing — they’re for monitoring.


AWS Health Dashboard

Two components:

DashboardScopePurpose
Service HealthAll AWSGlobal AWS service status
Your Account HealthYour accountEvents affecting YOUR resources

Your Account Health Dashboard:

EventBridge Integration:

AWS Health Event ──► EventBridge ──► Lambda/SNS/etc.
(your account)         (rule)        (automate response)

Use cases:

Health Event Types:

TypeDescription
Scheduled ChangePlanned maintenance
Account NotificationAccount-specific issues
IssueOngoing service problem

⚠️ Exam trap: “React to AWS service issues affecting my resources” → Health Dashboard + EventBridge.

⚠️ Exam trap: Service Health = public status. Your Account Health = personalized to your resources.


CloudWatch Evidently

Feature flags and A/B testing for applications.

Application ──► Evidently ──► Feature Flag / Variation
                   │
                   ▼
            Metrics collected
                   │
                   ▼
            Analyze results

Key Features:

FeatureDescription
Feature FlagsSafely launch features (enable/disable remotely)
A/B TestingCompare variations to measure impact
LaunchesGradual rollout to percentage of users
ExperimentsCompare metrics between variations

Use Cases:

⚠️ Exam trap: “Gradual feature rollout” or “A/B testing” → CloudWatch Evidently.

⚠️ Exam trap: Evidently is for application features, NOT infrastructure testing.


EventBridge Deep Dive

Event buses:

Bus TypeDescription
DefaultReceives events from AWS services
CustomYour application events
PartnerSaaS integrations (Datadog, Zendesk, etc.)

Event Flow:

Event Sources              EventBridge                    Targets
┌─────────────┐           ┌───────────┐                 ┌─────────┐
│ AWS Services│──────────►│           │                 │ Lambda  │
├─────────────┤           │   Event   │    Rules        ├─────────┤
│ Custom Apps │──────────►│    Bus    │────(filter)────►│ SQS/SNS │
├─────────────┤           │           │                 ├─────────┤
│ SaaS Partners│─────────►│           │                 │ Step Fn │
└─────────────┘           └───────────┘                 └─────────┘

Schema Registry:

Resource-based Policies:

Account A ──► EventBridge (Account A) ──► Event Bus (Account B - central)
Account B ──► EventBridge (Account B) ──► Event Bus (Account B - central)
Account C ──► EventBridge (Account C) ──► Event Bus (Account B - central)
                                              │
                                              ▼
                                         Central processing

⚠️ Exam trap: “Aggregate events from multiple accounts” → EventBridge Resource-based Policy for cross-account access.

⚠️ Exam trap: Schema Registry = auto-discover event structure, NOT define schemas manually.


AWS CloudTrail

Governance, compliance, and audit for your AWS Account.

Sources                      CloudTrail                    Destinations
┌─────────┐                                               ┌─────────────────┐
│   SDK   │──┐                                        ┌──►│ CloudWatch Logs │
├─────────┤  │                                        │   └─────────────────┘
│   CLI   │──┼──►  CloudTrail  ──► Inspect & Audit ──┤
├─────────┤  │                                        │   ┌─────────────────┐
│ Console │──┤                                        └──►│    S3 Bucket    │
├─────────┤  │                                            └─────────────────┘
│IAM Users│──┘
│& Roles  │
└─────────┘

Key Points:


CloudTrail Event Types

Event TypeDefaultWhat it logs
Management Events✅ EnabledOperations on resources (IAM, EC2, CloudTrail config)
Data Events❌ DisabledHigh-volume: S3 object-level, Lambda Invoke
Insights Events❌ DisabledUnusual activity detection

Management Events:

Data Events:


CloudTrail Insights

Detect unusual activity in your account:

Management Events ──► Continuous ──► CloudTrail ──► Insights ──┬──► CloudTrail Console
                      analysis        Insights       Events    │
                                                               ├──► S3 Bucket
                                                               │
                                                               └──► EventBridge (automation)

How it works:

  1. Analyzes normal management events → creates baseline
  2. Continuously analyzes write events → detects unusual patterns
  3. Anomalies appear in console, sent to S3, generate EventBridge event

CloudTrail Events Retention

StorageRetentionUse Case
CloudTrail90 daysQuick lookup, recent activity
S3 BucketLong-termCompliance, historical analysis

Long-term analysis: Log to S3 → query with Athena

Event Types:              CloudTrail           S3 Bucket           Athena
┌──────────────────┐     ┌─────────┐          ┌─────────┐        ┌─────────┐
│ Management Events│────►│         │          │         │        │         │
│ Data Events      │────►│ 90 days │───log───►│Long-term│──SQL──►│ Analyze │
│ Insights Events  │────►│retention│          │retention│        │         │
└──────────────────┘     └─────────┘          └─────────┘        └─────────┘

⚠️ Exam trap: “Who deleted the resource?” or “API call history” → CloudTrail.

⚠️ Exam trap: “Detect unusual IAM activity” or “burst of API calls” → CloudTrail Insights.

⚠️ Exam trap: “Keep CloudTrail logs beyond 90 days” → Log to S3, query with Athena.

⚠️ Exam trap: “Data Events disabled by default” — S3 object-level and Lambda Invoke need explicit enabling.


EventBridge Archive and Replay

Store events and replay them later — built-in feature, no custom code needed.

FeatureDescription
ArchiveStore events from any event bus (indefinitely or set retention)
ReplayRe-send archived events to same or different event bus
FilterArchive only matching events (use event patterns)
Use caseReplay production events in dev/test environment
Production Event Bus                     Dev Event Bus
     │                                        ▲
     ▼                                        │
  Archive ──────► Stored Events ─────► Replay │
  (filter)        (S3, managed)        (6 months later)

Key use case: Store production events → replay in dev environment for testing (periodically or on-demand).

⚠️ Exam trap: “Store EventBridge events for later replay” → Archive and Replay (NOT Lambda + S3/DynamoDB — over-engineered).

⚠️ Exam trap: “Most efficient and cost-effective way to store and replay events” → built-in feature wins over custom Lambda solutions.


EventBridge + CloudTrail Integration

Pattern: React to any API call with alerts/automation.

User ──► API Call ──► AWS Service ──► CloudTrail ──► EventBridge ──► SNS/Lambda
                      (logs API)       (event)        (alert/automate)

Examples:

TriggerFlow
User assumes IAM RoleIAM (AssumeRole) → CloudTrail → EventBridge → SNS
Security Group modifiedEC2 (AuthorizeSecurityGroupIngress) → CloudTrail → EventBridge → SNS
DynamoDB table deletedDynamoDB (DeleteTable) → CloudTrail → EventBridge → SNS
Example 1: IAM Role Assumption Alert
User ──► AssumeRole ──► IAM ──► CloudTrail ──► EventBridge ──► SNS
                              (API Call log)    (event)       (alert)

Example 2: Security Group Change Alert  
User ──► Edit SG Rules ──► EC2 ──► CloudTrail ──► EventBridge ──► SNS
         (AuthorizeSecurityGroupIngress)

Example 3: DynamoDB Table Deletion Alert
User ──► DeleteTable ──► DynamoDB ──► CloudTrail ──► EventBridge ──► SNS

Key insight: CloudTrail logs all API calls → EventBridge can react to any of them!

⚠️ Exam trap: “Alert when user assumes role” or “notify on Security Group changes” → CloudTrail + EventBridge + SNS.


AWS Config

Auditing and recording compliance of AWS resources over time.

Use cases:

Key Points:


Config Rules

Evaluate whether resources are compliant with desired configurations.

Rule TypeDescription
AWS Managed Rules75+ pre-built rules
Custom RulesDefined in Lambda

Examples:

Rule Triggers:

⚠️ Config Rules does NOT prevent actions (no deny) — only evaluates compliance!


Config Rules — Notifications

Two notification patterns:

Pattern 1: EventBridge (filtered, action-oriented)

AWS Resources ──► AWS Config ──► NON_COMPLIANT ──► EventBridge ──┬──► Lambda
                  (monitor)                        (trigger)     ├──► SNS
                                                                 └──► SQS

Pattern 2: SNS (all events)

AWS Resources ──► AWS Config ──► All events ──► SNS ──► Admin
                  (monitor)      (config changes,       (notification)
                                 compliance state)

Use SNS Filtering or client-side filtering for Pattern 2.


Config Rules — Remediations

Auto-fix non-compliant resources using SSM Automation Documents.

Non-Compliant Resource ──► AWS Config ──► SSM Automation ──► Auto-Remediation
(e.g., expired IAM key)    (detect)       Document            (deactivate key)
                                          (Retries: 5)

Remediation Options:

OptionDescription
AWS-Managed DocumentsPre-built remediation actions
Custom DocumentsYour own automation (can invoke Lambda)
Remediation RetriesRetry if still non-compliant after auto-fix

Example:

⚠️ Exam trap: “Auto-remediate non-compliant resources” → AWS Config + SSM Automation Documents.

⚠️ Exam trap: “Config Rules” = detect/evaluate only. “Auto-remediation” = SSM Automation.

⚠️ Exam trap: Config does NOT prevent actions (no deny) — it only detects non-compliance after the fact.


CloudWatch vs CloudTrail vs Config

ServicePurposeQuestion it Answers
CloudWatchPerformance monitoring, dashboards, alerts, logsHow is my app performing?
CloudTrailAPI call history, auditWHO made changes?
ConfigConfiguration compliance, change timelineIs my resource compliant? How did it change?

Quick Decision:

"Performance/metrics/dashboard"     → CloudWatch
"Who did it? / API calls / audit"   → CloudTrail  
"Is it compliant? / config history" → Config

Example: ELB Monitoring (CloudWatch vs CloudTrail vs Config)

ServiceELB Use Case
CloudWatchMonitor incoming connections, visualize error codes %, dashboard for performance
ConfigTrack SG rules, track config changes, ensure SSL certificate always assigned (compliance)
CloudTrailTrack WHO made changes to the Load Balancer (API calls)

⚠️ Exam trap: This comparison is exam-favorite! Remember:


🚫 Common Wrong Answers Explained

Scenario: “Alert on keyword in logs (Error, Exception)”

Wrong AnswerWhy Wrong
❌ Lambda polling logs hourlyExpensive compute, not real-time, over-engineered
❌ AWS Config RuleConfig monitors resource configuration, not log content
❌ CloudTrailCloudTrail logs API calls, not application logs
Metric Filter + AlarmBuilt-in, near real-time, cost-effective

Scenario: “Monitor EC2 memory/RAM”

Wrong AnswerWhy Wrong
❌ CloudWatch default metricsRAM is NOT a default metric (only CPU, Disk, Network)
❌ Enable detailed monitoringDetailed = 1-minute instead of 5-minute, still no RAM
❌ CloudTrailCloudTrail is for API audit, not metrics
CloudWatch Unified AgentRequired for OS-level metrics (RAM, processes, disk IO)

Scenario: “Who deleted the resource / API audit”

Wrong AnswerWhy Wrong
❌ CloudWatch LogsLogs application output, not API calls
❌ AWS ConfigConfig tracks config state over time, not who made changes
❌ VPC Flow LogsNetwork traffic, not API calls
CloudTrailRecords ALL API calls with user identity

Scenario: “Is resource compliant? / Track config changes”

Wrong AnswerWhy Wrong
❌ CloudTrailShows API calls, not current config state or compliance
❌ CloudWatchPerformance metrics, not configuration compliance
❌ IAM Access AnalyzerAnalyzes IAM policies, not general resource config
AWS ConfigRecords config changes + evaluates compliance rules

Scenario: “Prevent non-compliant resource creation”

Wrong AnswerWhy Wrong
❌ AWS Config RulesConfig detects after the fact, doesn’t prevent
❌ CloudTrailAudit only, no enforcement
SCPsPrevent at Organization level
IAM PoliciesPrevent at user/role level

Scenario: “Auto-remediate non-compliant resources”

Wrong AnswerWhy Wrong
❌ Config Rules aloneRules only detect, don’t fix
❌ CloudWatch AlarmsAlarms alert, don’t remediate config
❌ Lambda (without trigger)No automatic invocation mechanism
Config + SSM AutomationConfig detects → SSM Document remediates

Scenario: “Real-time log processing / streaming”

Wrong AnswerWhy Wrong
❌ S3 Export (CreateExportTask)Batch only, up to 12 hours latency
❌ CloudWatch Logs InsightsQuery engine, not real-time stream
Subscription FiltersReal-time to Lambda/Kinesis

Scenario: “Debug microservices / distributed tracing”

Wrong AnswerWhy Wrong
❌ CloudWatch LogsShows individual service logs, not request path
❌ CloudWatch MetricsShows aggregated metrics, not individual traces
❌ VPC Flow LogsNetwork packets, not application-level tracing
X-RayTraces requests across services, shows latency per hop

Scenario: “React to AWS service events / automate on events”

Wrong AnswerWhy Wrong
❌ CloudWatch AlarmsOnly for metric thresholds, not AWS events
❌ SNS aloneNeeds something to trigger it
❌ Lambda scheduledPolling, not event-driven
EventBridgeNative integration with 100+ AWS services

Scenario: “Keep CloudTrail logs beyond 90 days”

Wrong AnswerWhy Wrong
❌ Increase CloudTrail retentionNot configurable, always 90 days in console
❌ CloudWatch LogsDifferent service, not where CloudTrail stores
S3 + AthenaS3 for storage, Athena for SQL queries

Scenario: “Alert on API call (AssumeRole, Security Group change)”

Wrong AnswerWhy Wrong
❌ CloudWatch AlarmAlarms are for metrics, not API events
❌ Config RuleConfig checks compliance, not individual API calls
❌ GuardDutyThreat detection, not general API alerting
CloudTrail + EventBridge + SNSCloudTrail logs → EventBridge triggers → SNS alerts

Scenario: “Monitor Security Group rules / port exposure”

Wrong AnswerWhy Wrong
❌ CloudWatch MetricsMetrics = performance (CPU, network) — not config state
❌ CloudTrailLogs API calls (who changed SG) — not current config state
❌ Lambda on scheduleWorks but over-engineered — Config has built-in rules
Config RulesContinuously monitors Security Group configurations

Example Config Rules for Security Groups:


Scenario: “Config feature for email/alert on config change”

Wrong AnswerWhy Wrong
❌ Config RulesEvaluate compliance — doesn’t send notifications itself
❌ Config RemediationsAuto-fix resources — not for alerting
Config NotificationsSend alerts via SNS on config changes

Config Features Quick Reference:

FeaturePurpose
Config RulesEVALUATE — is it compliant?
Config NotificationsALERT — send SNS/email
Config RemediationsFIX — auto-correct via SSM

Scenario: “Store EventBridge events for later replay/testing”

Wrong AnswerWhy Wrong
❌ Lambda → S3Over-engineered — built-in feature exists
❌ Lambda → DynamoDBNot designed for event replay
❌ Kinesis Firehose → S3Extra service, no native replay
EventBridge Archive and ReplayNative feature, cost-effective, replay to any event bus

Quick Reference: Service Confusion Matrix

If Question SaysNOT ThisUse This Instead
“Monitor RAM/memory”Default CW metricsUnified Agent
“Who did it / audit”CloudWatch, ConfigCloudTrail
“Is it compliant”CloudTrailConfig
“Prevent action”Config RulesSCPs / IAM
“Real-time logs”S3 ExportSubscriptions
“Log keyword alert”Lambda polling, ConfigMetric Filter
“Distributed tracing”CW Logs, VPC FlowX-Ray
“React to events”CW AlarmsEventBridge
“Auto-fix non-compliant”Config aloneConfig + SSM
“Port/SG exposed”CloudWatch, CloudTrailConfig Rules
“Store/replay events”Lambda+S3, DynamoDBEventBridge Archive
“Monitor website/API”CloudWatch MetricsSynthetics Canaries
“A/B testing / feature flags”Lambda, custom codeEvidently
“AWS outage affects me”Service HealthYour Account Health + EventBridge


🎯 MASTER SUMMARY: Monitoring & Observability Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: The Three Pillars — Performance vs Audit vs Compliance

WHY: AWS separates concerns into three distinct services because each answers a fundamentally different question:

ServiceQuestionData Type
CloudWatch“How is it performing?”Metrics, logs, dashboards
CloudTrail“Who did what?”API call history
Config“Is it compliant?”Resource configuration state

Application: When you see keywords, map to the pillar:


Principle 2: Real-time vs Batch — Know the Latency

WHY: Exam tests whether you know which service provides real-time data vs batch processing.

NeedNOT This (Batch)Use This (Real-time)
Log processingS3 Export (12h)Subscription Filters
Metrics streamingPull from APIMetric Streams
Event reactionScheduled LambdaEventBridge

Principle 3: Default Metrics vs Agent Required

WHY: EC2 hypervisor can only see certain metrics. OS-level metrics require an agent.

Metric TypeDefault (Hypervisor)Agent Required
CPU-
Network-
Disk (high-level)-
RAM/Memory✅ Unified Agent
Processes✅ Unified Agent
Disk IO detailed✅ Unified Agent

Principle 4: Detect vs Prevent vs Remediate

WHY: Config ONLY detects. It cannot prevent or fix.

PREVENT          DETECT           REMEDIATE
   │                │                 │
   ▼                ▼                 ▼
SCPs/IAM ───► Config Rules ───► SSM Automation
(before)      (after the fact)    (auto-fix)

Application:


Principle 5: CloudTrail Retention — 90 Days is the Limit

WHY: CloudTrail console only keeps 90 days. For long-term, you MUST export.

CloudTrail (90 days) ──► S3 (unlimited) ──► Athena (query)

Principle 6: X-Ray is for Distributed Tracing ONLY

WHY: X-Ray traces request flow across services. It’s NOT for:

Application: “Debug latency in microservices” or “find bottleneck between services” → X-Ray


Principle 7: EventBridge is the Event Router

WHY: EventBridge connects AWS services, custom apps, and SaaS. CloudWatch Alarms only handle metric thresholds.

Event TypeService
Metric crosses thresholdCloudWatch Alarm
AWS service state changeEventBridge
API call madeCloudTrail → EventBridge
Scheduled taskEventBridge Scheduler

Principle 8: Synthetics vs X-Ray — External vs Internal

WHY: They solve different problems:

ServicePerspectiveUse Case
Synthetics CanariesExternal (customer view)Is my site/API up?
X-RayInternal (developer view)Where’s the bottleneck?

Part 2: Decision Tree (Follow Keywords → Find Answer)

START: What does the question ask about?
│
├─► "Who did it?" / "audit" / "API history"
│   └─► CloudTrail
│
├─► "Is it compliant?" / "configuration state" / "rules"
│   └─► Config
│
├─► "Performance" / "metrics" / "dashboard" / "alarm"
│   └─► CloudWatch
│       │
│       ├─► "RAM/memory on EC2" → Unified Agent
│       ├─► "Keyword in logs" → Metric Filter + Alarm
│       ├─► "Real-time log stream" → Subscription Filter
│       └─► "Stream metrics to S3" → Metric Streams
│
├─► "Trace requests" / "microservices debug" / "latency between services"
│   └─► X-Ray
│
├─► "React to event" / "automate on state change"
│   └─► EventBridge
│       │
│       ├─► "Store/replay events" → Archive and Replay
│       └─► "Cross-account events" → Resource-based Policy
│
├─► "Monitor website/API availability"
│   └─► Synthetics Canaries
│
├─► "Feature flags" / "A/B testing" / "gradual rollout"
│   └─► Evidently
│
└─► "AWS service issue affecting me"
    └─► Health Dashboard + EventBridge

Part 3: The “CANNOT” List

ServiceCANNOT Do
CloudWatch default metricsMonitor RAM/memory
Config RulesPrevent resource creation (only detect after)
CloudTrailKeep logs beyond 90 days (without S3)
S3 Export (logs)Real-time processing (up to 12h delay)
CloudWatch AlarmsReact to AWS service events (only metrics)
X-RayShow aggregated metrics (only traces)
Metric FiltersFilter BEFORE logs arrive (only after)

Part 4: Quick Reference Tables

CloudWatch Insights Comparison

Insight TypeWhat it MonitorsKey Feature
Container InsightsECS/EKS/FargateContainer metrics + logs
Lambda InsightsLambda functionsCold starts, memory
Contributor InsightsLog dataTop-N talkers
Application InsightsJava/.NET appsSageMaker-powered dashboards

Log Destinations Comparison

DestinationLatencyUse Case
S3 ExportUp to 12 hoursArchive, compliance
Subscription → LambdaReal-timeTransform, OpenSearch
Subscription → FirehoseNear real-timeS3, Redshift delivery
Subscription → KinesisReal-timeCustom analytics

CloudTrail Event Types

Event TypeDefaultExamples
Management Events✅ ONIAM, EC2, CloudTrail config
Data Events❌ OFFS3 object-level, Lambda Invoke
Insights Events❌ OFFUnusual activity detection

Part 5: Instant-Answer Table

Question Contains→ Instant Answer
“Monitor RAM/memory EC2”Unified Agent
“Who deleted resource”CloudTrail
“API call history”CloudTrail
“Is resource compliant”Config
“Track config changes over time”Config
“Prevent non-compliant creation”SCPs / IAM Policies
“Auto-remediate non-compliant”Config + SSM Automation
“Alert on log keyword”Metric Filter + Alarm
“Real-time log processing”Subscription Filters
“Stream metrics to S3”Metric Streams via Firehose
“Aggregate logs multi-account”Subscriptions → Kinesis
“Debug microservices”X-Ray
“Trace request across services”X-Ray
“Filter traces by attribute”X-Ray Annotations
“React to AWS service event”EventBridge
“Schedule task (cron)”EventBridge Scheduler
“Store/replay events”EventBridge Archive
“Cross-account event bus”EventBridge Resource Policy
“CloudTrail beyond 90 days”S3 + Athena
“Unusual IAM activity”CloudTrail Insights
“Monitor website/API up”Synthetics Canaries
“A/B testing”Evidently
“Feature flags”Evidently
“Gradual feature rollout”Evidently
“AWS outage affecting me”Health Dashboard + EventBridge
“Top network users in logs”Contributor Insights
“Lambda cold starts”Lambda Insights
“Java/.NET app dashboard”Application Insights
“Container metrics ECS/EKS”Container Insights
“Network to on-premises”Network Synthetic Monitor
“SG port exposure check”Config Rules

Part 6: Elimination Checklist

□ Is it about WHO did something?
  → Yes = CloudTrail
  → No = Continue

□ Is it about COMPLIANCE or configuration state?
  → Yes = Config
  → No = Continue

□ Is it about PREVENTING creation?
  → Yes = SCPs/IAM (Config can't prevent)
  → No = Continue

□ Is it about REAL-TIME log processing?
  → Yes = Subscription Filters (S3 Export has 12h delay)
  → No = Continue

□ Is it about RAM/memory metrics?
  → Yes = Unified Agent (not default metrics)
  → No = Continue

□ Is it about distributed tracing?
  → Yes = X-Ray
  → No = Continue

□ Is it about reacting to AWS events?
  → Yes = EventBridge (not CloudWatch Alarms)
  → No = Continue

□ Is it about website/API monitoring?
  → Yes = Synthetics Canaries
  → No = Continue

□ Is it about feature rollout/A/B testing?
  → Yes = Evidently
  → No = Continue

🏆 The Golden Rules

  1. RAM = Agent (EC2 memory requires Unified Agent)
  2. Who = Trail (CloudTrail for API audit)
  3. Compliant = Config (Config for rules and compliance)
  4. Prevent ≠ Config (Config detects, doesn’t prevent)
  5. Real-time logs = Subscriptions (S3 Export is batch)
  6. 90 days = S3 (CloudTrail needs S3 for long-term)
  7. Trace = X-Ray (Distributed tracing across services)
  8. Events = EventBridge (Not CloudWatch Alarms)
  9. Keyword alert = Metric Filter (Not Lambda polling)
  10. Remediate = SSM (Config + SSM Automation)
  11. Top talkers = Contributor (Contributor Insights)
  12. Lambda health = Lambda Insights (As a Lambda Layer)
  13. Website up = Canaries (Synthetics Canaries)
  14. Feature flags = Evidently (CloudWatch Evidently)
  15. Store events = Archive (EventBridge Archive and Replay)

Security:

AWS Security Groups (SG):

Security Group (Firewall) controls how traffic is allowed into or out of EC2 Instances or other Security Groups. Can be attached to multiple instances. Locked down to a region/VPC combination. Does live “outside” EC2", if traffic blocked, EC2 won’t see it.
Security Group by default denies every inbound traffic and contain only allow rules. All outbound traffic is authorised.

Security Group Rules regulate:

Best practices:

Troubleshooting:

Amazon S3 - Security:

IAM Access Analyzer for S3:

Network Protection:

DDoS Protection on AWS:

AWS WAF: filter specific requests based on rules and protects web application from common web exploits (Layer 7). Deploy on Application Load Balancer, API Gateway and CloudFront.

Define Web ACL (Web Access Controll List):

AWS Network Firewall protect entire Amazon VPC (from layer 3 to layer 7). Inspect directions:

AWS Firewall Manager manages security rules in all accounts of an AWS Organization. Rules are applied to new resources as they are created (good for compliance) across all and future accounts in your Organization.

Security policies:

Penetration Testing and Abusing activities on AWS Cloud:

AWS Acceptable use policy:

Eight servics that are allowed without prior approval from AWS to carry out security assessments or penetration tests:

Prohibited Actiities:

AWS Abuse report suspected AWS resources used for abusive or illegal purposes.

Abusive & prohibited behaviors are:

Encryption Management:

AWS KMS (Key Management Service) service that helps manage the ecryption keys.

Encryption Opt-in:

Encryption Automatically enabled:

AWS CloudHSM (Cloud Hardware Security Module) provisioning encryption hardware. Customer manages all ecryption keys.

CloudHSM vs KMS:

FeatureAWS KMSAWS CloudHSM
Key ManagementAWS manages keysCustomer manages keys
Access ControlIAM policies + Key policiesYou manage users in HSM
HardwareShared (multi-tenant)Dedicated hardware (single-tenant)
FIPS 140-2Level 2Level 3 (tamper-evident)
High AvailabilityAWS managedYou must set up cluster across AZs
IntegrationNative with 100+ AWS servicesCustom integration needed
CostPay per key + API calls~$1.50/hour per HSM
Use CaseMost encryption needsStrict compliance, BYOK, SSL/TLS offload

Key Insight:

KMS:                                    CloudHSM:
┌─────────────────────────┐            ┌─────────────────────────┐
│      AWS manages        │            │    Customer manages     │
│  ┌─────────────────┐    │            │  ┌─────────────────┐    │
│  │   Key Material  │    │            │  │   Key Material  │    │
│  │  (AWS controls) │    │            │  │ (YOU control)   │    │
│  └────────▲────────┘    │            │  └────────▲────────┘    │
│           │             │            │           │             │
│     IAM Policy          │            │    HSM Users/Certs      │
│   (access control)      │            │   (you manage)          │
└─────────────────────────┘            └─────────────────────────┘

⚠️ Exam trap: “Customer needs to manage their own encryption keys with FIPS 140-2 Level 3” → CloudHSM (KMS is Level 2)

⚠️ Exam trap: “AWS should NOT have access to encryption keys” → CloudHSM (with KMS, AWS manages key material)

⚠️ Exam trap: “Multi-region” + “Global database” + “client-side encryption” → KMS Multi-Region Keys (NOT CloudHSM)

ScenarioAnswerWhy NOT other
Aurora Global + client-side encryptionKMS Multi-Region KeysCloudHSM can’t replicate keys across regions
FIPS 140-2 Level 3 complianceCloudHSMKMS is Level 2 only
AWS must NOT access keysCloudHSMKMS = AWS manages key material

Type of KMS Keys (based on creating, managing, using rotaion policies):

AWS Certificate Manager (ACM) is a managed service to provision, manage, and deploy public and private SSL/TLS certificates with AWS services and internal connected resources. Intergrated with Elastic Load Balancer, CloudFront Distributions, APIs on API Gateway.

AWS Secrets Manager helps you manage, retrieve, and rotate database credentials, API keys, and other secrets throughout their lifecycles. Integrated with AWS Lambda, AWS RDS (MySQL, PostgreSQL, Aurora).

Rotation sercrets is the process of periodically updating a secret. When you rotate a secret, you update the credentials in both the secret and the database or service. In Secrets Manager, you can set up automatic rotation for your secrets.

AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, and license codes as parameter values.
Parameter Store doesn’t provide automatic rotation services for stored secrets.

Other Security Tools:

Amazon GuardDuty intelligent (uses Machine Learning) threat discovery to protect your AWS account.

Input data includes:

Amazon Inspector automatically discovers workloads, such as Amazon EC2 instances, containers, and Lambda functions, and scans them for software vulnerabilities and unintended network exposure.

AWS Config is a config tool that helps you assess, audit, and evaluate the configurations and relationships of your resources. Possibility of storing the configuration data into S3 (analyzed by Athena) and recieving alerts (SNS notifications) for any changes. Per-region service, but can be aggregated across regions and accounts.

AWS Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. Macie helps identify and alert you to sensitive data, such as personally identifiable information (PII).

AWS Detective analyzes, investigates and quickly identifies the root cause of security issues or suspicious activities (using ML and graphs). Automatically collects and processes events from VPC Flow Logs, CloudTrail, GuardDuty and create unified view.

AWS Security Hub central security tool to manage security across several AWS accounts and automate security checks. Integrated dashboards showing current security and compliance status to quickly take actions. Not enabled by default

Autmatically aggregates alerts:

Security Compliances and Reports:

AWS Artifact (not really a service) portal that provides customers with on-demand access to AWS compliance documentation and AWS agreements. Can be used to support internal audit or compliance.

IAM Access Analyzer: finds out which resources are shared externally of defined Zone of Trust (AWS account or AWS Organization):

Encryption Types:

TypeWhere EncryptedWho Has KeysUse Case
Encryption in Flight (TLS/SSL)During transmissionTLS certificatesProtect data in transit, prevent MITM attacks
Server-Side Encryption (SSE)At rest, on serverServer (AWS manages)S3, EBS, RDS - data protected at rest
Client-Side EncryptionBefore sendingClient onlyServer should NOT see plaintext (zero-trust)
Client-Side Encryption:
┌──────────┐                              ┌─────────────────┐
│  Client  │  encrypted data              │  Storage (S3)   │
│ ┌──────┐ │  ─────────────────────────►  │                 │
│ │ Key  │ │                              │ Encrypted blob  │
│ └──────┘ │  ◄─────────────────────────  │ (can't decrypt) │
└──────────┘  encrypted data              └─────────────────┘

Server-Side Encryption:
┌──────────┐   plaintext     ┌─────────────────────────────────┐
│  Client  │  ────────────►  │  AWS Service (S3)               │
│          │   HTTPS         │  ┌─────┐  encrypt   ┌────────┐  │
│          │                 │  │ Key │ ─────────► │ Stored │  │
│          │  ◄────────────  │  └─────┘  decrypt   └────────┘  │
└──────────┘   plaintext     └─────────────────────────────────┘

⚠️ Exam trap: “[Service] Client-side Encryption” terminology

When question says “data must not be disclosed even by company admins”: → Client-side encryption (service stores only ciphertext, can’t decrypt)

AWS KMS (Key Management Service):

AWS KMS manages encryption keys for AWS services. Anytime you hear “encryption” for an AWS service, it’s most likely KMS.

KMS Key Types:

Key TypeDescriptionAccess to Key Material
Symmetric (AES-256)Single key for encrypt/decryptNever (must use KMS API)
Asymmetric (RSA/ECC)Public + Private key pairPublic key downloadable, private never

Asymmetric Key Usage (IMPORTANT):

Key TypeKey UsageCan DoCannot Do
RSAENCRYPT_DECRYPTEncrypt, DecryptSign, Verify
RSASIGN_VERIFYSign, VerifyEncrypt, Decrypt
ECCSIGN_VERIFY onlySign, VerifyEncrypt, Decrypt (never!)

⚠️ Exam trap: “Asymmetric key for encryption AND signing” → IMPOSSIBLE with single key. Need TWO separate keys.

KMS Key Ownership & Pricing:

Key TypeCostExampleRotation
AWS OwnedFreeSSE-S3, SSE-SQS, SSE-DDBAWS manages
AWS ManagedFreeaws/rds, aws/ebsAuto every 1 year
Customer Managed (created)$1/month + API callsYour keysMust enable, auto every 1 year
Customer Managed (imported)$1/month + API callsBYOKManual only (use alias)

KMS Key Rotation Deep Dive:

Key TypeAuto RotationPeriodNotes
AWS Managed✅ Always ON1 yearCannot disable
Customer ManagedOptional (must enable)1 yearOn-demand also available
Imported❌ Not availableN/AManual only via alias

How rotation works:

Before rotation:          After rotation:
┌─────────────────┐       ┌─────────────────┐
│ Key ID: abc-123 │       │ Key ID: abc-123 │  ◄── Same ID!
│ ┌─────────────┐ │       │ ┌─────────────┐ │
│ │ Key Material│ │       │ │OLD Material │ │  ◄── Kept for decrypt
│ │ (v1)        │ │       │ │(v1)         │ │
│ └─────────────┘ │       │ ├─────────────┤ │
└─────────────────┘       │ │NEW Material │ │  ◄── Used for encrypt
                          │ │(v2)         │ │
                          │ └─────────────┘ │
                          └─────────────────┘

⚠️ Exam trap: Rotation period = 1 year FIXED (cannot be changed to 90 days, 6 months, etc.)

⚠️ Exam trap: Imported keys can ONLY be rotated manually (no automatic rotation)

Manual Rotation (for custom rotation periods):

If policy requires rotation more frequently than 1 year (e.g., 6 months):

  1. Create a new KMS CMK
  2. Update the Key Alias to point to new key
  3. Keep old key (needed to decrypt old data)
  4. Applications using alias automatically use new key
6 months ago:                    Now (after manual rotation):
┌─────────────────┐              ┌─────────────────┐
│ Alias: my-key   │──────────┐   │ Alias: my-key   │──────────┐
└─────────────────┘          │   └─────────────────┘          │
                             ▼                                 ▼
                    ┌─────────────┐               ┌─────────────┐
                    │ CMK-OLD     │               │ CMK-NEW     │
                    │ (key-111)   │               │ (key-222)   │
                    └─────────────┘               └─────────────┘
                           │                             │
                           │ Still exists!               │
                           │ (decrypt old data)          │ (new encryptions)
                           ▼                             ▼

⚠️ Exam trap: “Rotate every 6 months” → Manual rotation with aliases (auto rotation is 1 year only, cannot configure)

Wrong answers explained:

KMS Access Control (Key Policies):

Key Policy = PRIMARY way to control access to KMS keys (resource-based policy).

Access MethodDescriptionRequired?
Key PolicyResource-based policy ON the keyAlways required
IAM PolicyIdentity-based policy on user/roleOptional (works WITH key policy)
GrantsTemporary, delegated accessOptional

Critical difference from other AWS services:

S3 Access:                          KMS Access:
IAM Policy ──► S3 Bucket            IAM Policy ──┐
     │                                           │
     └─► Access granted!                         ▼
                                    Key Policy ──► KMS Key
                                         │
                                         └─► BOTH needed!

Default Key Policy:

Custom Key Policy - use cases:

⚠️ Exam trap: “KMS IAM Policy” alone → NOT enough! Key Policy is required. IAM policies work only if Key Policy allows it.

⚠️ Exam trap: “KMS ACL” → Does NOT exist! (Unlike S3, KMS has no ACLs)

KMS Grants:

Copying Snapshots Across Accounts:

  1. Create snapshot encrypted with Customer Managed Key
  2. Attach KMS Key Policy for cross-account access
  3. Share the encrypted snapshot
  4. (Target account) Copy snapshot, re-encrypt with target account’s CMK
  5. Create volume from snapshot

Copying Snapshots Across Regions:

Region A (eu-west-2)              Region B (ap-southeast-2)
┌─────────────┐                   ┌─────────────┐
│ EBS Volume  │                   │ EBS Volume  │
│ (KMS Key A) │                   │ (KMS Key B) │
└──────┬──────┘                   └──────▲──────┘
       │                                 │
       ▼                                 │
┌─────────────┐   ReEncrypt with   ┌─────────────┐
│ Snapshot    │   KMS Key B        │ Snapshot    │
│ (Key A)     │ ─────────────────► │ (Key B)     │
└─────────────┘                    └─────────────┘

KMS Multi-Region Keys:

⚠️ Exam trap: “The same KMS key cannot exist in two regions” → FALSE with Multi-Region keys. Regular KMS keys are regional, but Multi-Region keys CAN exist in multiple regions with same key ID.

                    ┌─────────────────┐
                    │   us-west-2     │
                    │ Replica Key     │
                    │ mrk-1234...     │
                    └────────▲────────┘
                             │ sync
┌─────────────────┐          │          ┌─────────────────┐
│   us-east-1     │──────────┴──────────│   eu-west-1     │
│ PRIMARY Key     │       sync          │ Replica Key     │
│ mrk-1234...     │─────────────────────│ mrk-1234...     │
└─────────────────┘                     └─────────────────┘

Multi-Region Key Use Cases:

⚠️ Exam trap: Multi-Region keys are NOT “global keys” - each replica is managed independently in its region

AMI Sharing with KMS Encryption:

When sharing encrypted AMI across accounts, you must share BOTH the AMI AND the KMS key access.

Account A (Source)                        Account B (Target)
┌────────────────────────────────┐       ┌────────────────────────────────┐
│                                │       │                                │
│  ┌──────────────┐              │       │              ┌──────────────┐  │
│  │ AMI          │              │       │              │ EC2 Instance │  │
│  │ (encrypted)  │──────────────┼──────►│──────────────│ (launched)   │  │
│  └──────┬───────┘   Share AMI  │       │   Launch     └──────────────┘  │
│         │                      │       │                      ▲         │
│         │ encrypted with       │       │                      │         │
│         ▼                      │       │                      │         │
│  ┌──────────────┐              │       │              uses key to       │
│  │ KMS Key      │──────────────┼──────►│──────────────decrypt           │
│  │ (CMK)        │  Share Key   │       │                                │
│  └──────────────┘  (Key Policy)│       │                                │
│                                │       │                                │
└────────────────────────────────┘       └────────────────────────────────┘

Steps to share encrypted AMI:

  1. Source Account: AMI encrypted with Customer Managed Key (CMK)
  2. Modify KMS Key Policy: Add target account as authorized user
  3. Share AMI: Grant LaunchPermission to target account
  4. Target Account: Launch instance - KMS automatically decrypts

⚠️ Exam trap: Cannot share AMI encrypted with AWS Managed Key (aws/ebs) - must use Customer Managed Key

S3 Replication - Encryption Considerations:

Encryption TypeReplication Behavior
UnencryptedReplicated by default
SSE-S3Replicated by default
SSE-C (customer provided key)Can be replicated
SSE-KMSMust enable option explicitly

SSE-KMS Replication Requirements:

⚠️ Exam trap: Multi-Region KMS keys are treated as independent keys by S3 - object is still decrypted then re-encrypted (no optimization)

AWS Secrets Manager:

AWS Secrets Manager stores and manages secrets with automatic rotation.

Multi-Region Secrets:

us-east-1 (Primary)                    us-west-2 (Secondary)
┌─────────────────┐     replicate      ┌─────────────────┐
│ Secrets Manager │ ─────────────────► │ Secrets Manager │
│   MySecret-A    │                    │   MySecret-A    │
│   (primary)     │                    │   (replica)     │
└─────────────────┘                    └─────────────────┘

SSM Parameter Store vs Secrets Manager:

FeatureSSM Parameter StoreSecrets Manager
CostFree tier (Standard), charges for Advanced$0.40/secret/month + API calls
Auto Rotation❌ No✅ Yes (built-in Lambda)
RDS IntegrationManual✅ Native (MySQL, PostgreSQL, Aurora)
KMS EncryptionOptional (SecureString)✅ Always encrypted
Hierarchy✅ Path-based (/app/dev/db-password)❌ Flat
Multi-Region❌ No✅ Yes (replicas)
Version Tracking✅ Built-in✅ Built-in
Pull from CF/CDK✅ Direct reference✅ Direct reference

SSM Parameter Store - Version Tracking:

Parameter: /app/db-password
┌─────────────────────────────────────────┐
│ Version 1: "oldpass123"    (2024-01-01) │
│ Version 2: "newpass456"    (2024-06-01) │
│ Version 3: "latestpass789" (2025-01-01) │ ◄── Current
└─────────────────────────────────────────┘

⚠️ Exam trap: “Track secret values over time” → SSM Parameter Store (built-in versioning)

⚠️ Exam trap: “KMS Versioning” → Does NOT exist! KMS has key rotation (new key material), not value versioning

Where to Store Configuration/Secrets - Decision Guide:

RequirementBest ServiceWhy NOT others
Config values + version historySSM Parameter StoreDynamoDB (overkill), S3 (not designed for this), EBS (storage volume)
DB credentials + auto rotationSecrets ManagerSSM (no auto rotation), KMS (encryption only)
Hierarchical config (/app/prod/db)SSM Parameter StoreSecrets Manager (flat structure)
Sensitive + multi-regionSecrets ManagerSSM (no multi-region)

⚠️ Exam trap: “RDS password + automatic rotation”

Why SSM Parameter Store for “externally maintain config”:

Wrong answers explained:

When to use which:

⚠️ Exam trap: “Automatic rotation for DB credentials” → Secrets Manager (Parameter Store has NO auto rotation)

Lambda + Secrets - Security Options (worst to best):

OptionSecurity LevelWhy
❌ Embed in codeWORSTVisible in source control, logs, anyone with code access
❌ Plaintext env varBADVisible in Lambda console, CloudWatch logs
✅ Encrypted env var + KMSGOODEncrypted at rest, decrypted at runtime
✅✅ Secrets Manager/SSMBESTCentralized, audit trail, rotation, no env vars

Encrypted Environment Variable Flow:

1. Store secret as encrypted env var (using KMS)
   ┌─────────────────────────────────────────┐
   │ Lambda Config                           │
   │ DB_PASSWORD = AQICAHh...encrypted...    │
   └─────────────────────────────────────────┘
                      │
2. At runtime, Lambda decrypts using KMS
                      │
                      ▼
   ┌──────────┐    decrypt    ┌──────────┐
   │  Lambda  │ ────────────► │   KMS    │
   │  code    │ ◄──────────── │   CMK    │
   └──────────┘   plaintext   └──────────┘
                      │
3. Use decrypted value to connect to DB
                      │
                      ▼
               ┌──────────┐
               │    RDS   │
               └──────────┘

Why encrypted env var is “most secure” in the question:

⚠️ Exam context: If Secrets Manager is an option, it’s usually the BEST answer (centralized + rotation + audit). But among the 3 options given, encrypted env var wins.

AWS Certificate Manager (ACM):

ACM provisions, manages, and deploys TLS/SSL certificates.

ACM Integrations:

ServiceNotes
ELB (CLB, ALB, NLB)Provision certs directly
CloudFrontMust be in us-east-1
API GatewayEdge-optimized or Regional

⚠️ Exam trap: Cannot use ACM with EC2 directly (private key can’t be extracted)

ACM + API Gateway - Certificate Region Rules:

API Gateway TypeCertificate Location
Edge-OptimizedACM cert must be in us-east-1 (CloudFront region)
RegionalACM cert must be in same region as API Gateway

Memory trick: “Where does TLS terminate?”

API Gateway Endpoint Types Explained:

TypeAudienceHow It WorksACM Region
Edge-Optimized (default)Global clientsRequests routed via CloudFront edge locations → reduces latencyus-east-1 only
RegionalSame-region clientsDirect access, can add your own CloudFront for more controlSame as API Gateway
PrivateVPC onlyAccess via VPC Interface Endpoint (ENI)Same as API Gateway

Edge-Optimized (default):

Regional:

Edge-Optimized:                         Regional:
┌─────────────┐                        ┌─────────────────┐
│  us-east-1  │                        │  ap-southeast-2 │
│ ┌─────────┐ │                        │ ┌─────────────┐ │
│ │   ACM   │─┼──► CloudFront          │ │ API Gateway │ │
│ └─────────┘ │   (AWS managed)        │ └──────┬──────┘ │
└─────────────┘        │               │        │        │
                       ▼               │ ┌──────▼──────┐ │
               ┌─────────────┐         │ │     ACM     │ │
               │ API Gateway │         │ │ (same rgn)  │ │
               │ (any region)│         │ └─────────────┘ │
               └─────────────┘         └─────────────────┘

Regional + Custom CloudFront (more DDoS control):

┌─────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  us-east-1  │     │  ap-southeast-2 │     │  ap-southeast-2 │
│ ┌─────────┐ │     │ ┌─────────────┐ │     │ ┌─────────────┐ │
│ │   ACM   │─┼────►│ │ CloudFront  │─┼────►│ │ API Gateway │ │
│ │(for CF) │ │     │ │(your own)   │ │     │ │  Regional   │ │
│ └─────────┘ │     │ └─────────────┘ │     │ └─────────────┘ │
└─────────────┘     └─────────────────┘     └─────────────────┘
                    + WAF attached here

⚠️ Exam trap: Edge-Optimized uses CloudFront but certificate must be in us-east-1, NOT in the API Gateway’s region

Route 53 Setup:

ACM + ALB - HTTP to HTTPS Redirect:

User ──► HTTP ──► ALB ──► Redirect to HTTPS
     ◄── 301 ◄──────┘
User ──► HTTPS ──► ALB ──► EC2 (Auto Scaling)
                    │
                    ▼
                   ACM (provision/maintain certs)

Importing Public Certificates:

⚠️ Exam trap — ACM certificate expiry monitoring (created vs imported):

FeatureACM-Created CertsImported (Third-Party) Certs
Auto-renewal✅ Yes (60 days before)❌ No — must manually re-import
CW DaysToExpiry metric✅ Yes❌ No
EventBridge events✅ Yes (daily, 45 days before)✅ Yes (daily, 45 days before)
AWS Config rule✅ Works✅ Works — best for imported

AWS WAF (Web Application Firewall):

AWS WAF protects web apps from Layer 7 (HTTP) exploits.

Deploy on:

Web ACL Rules:

Rule TypeDescription
IP SetUp to 10,000 IPs (use multiple rules for more)
String matchHTTP headers, body, URI strings
SQL injectionBlock SQLi attacks
XSSBlock Cross-Site Scripting
Size constraintsLimit request size
Geo-matchBlock countries
Rate-basedDDoS protection (count events)

WAF + Fixed IP (Load Balancer):

⚠️ Exam trap: “Attach WAF to NLB” → IMPOSSIBLE! WAF = Layer 7 (HTTP), NLB = Layer 4 (TCP/UDP). Use ALB instead, or put Global Accelerator in front for fixed IPs.

WAF-Compatible Services:

✅ Supported❌ NOT Supported
ALBNLB
API GatewayEC2 directly
CloudFrontRoute 53
AppSyncCLB (Classic)
Cognito User Pool
Users ──► Global Accelerator ──► ALB ◄── WAF (WebACL)
          (Fixed IP: 1.2.3.4)     │       (same region)
                                  ▼
                             EC2 Instances

WAF vs Firewall Manager vs Shield:

ServiceUse CaseScope
WAFGranular protection, Web ACL rulesSingle resource
Firewall ManagerManage WAF across accounts, auto-protect new resourcesAWS Organization
Shield AdvancedDDoS protection, SRT support, cost protectionEnhanced DDoS

Decision Guide:

AWS Shield (DDoS Protection):

FeatureShield StandardShield Advanced
CostFree (all customers)$3,000/month/org
LayerLayer 3/4Layer 3/4/7
ProtectionSYN/UDP floods, reflection+ sophisticated attacks
ResourcesAllEC2, ELB, CloudFront, Global Accelerator, Route 53
DDoS Response Team✅ 24/7 access to DRP
Cost Protection✅ (no higher fees during attack)
Auto WAF rules✅ (creates rules for L7 attacks)

AWS Firewall Manager:

Firewall Manager manages security rules across all accounts in AWS Organization.

Manages:

⚠️ Exam keywords → Firewall Manager:

Why NOT others for “centrally manage across accounts”:

DDoS Resiliency Best Practices:

AWS DDoS Best Practices Reference Architecture:

                            AWS Edge Services
┌─────────────────────────────────────────────────────────────────────┐
│  BP1: Global Accelerator    BP3: Route 53    BP1/BP2: CloudFront   │
│  (fixed IPs, Shield)        (DNS at edge)    (cache + WAF)         │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                              Region                                  │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                        VPC (BP5)                              │   │
│  │   ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐  │   │
│  │   │ Public      │    │ BP6: ELB    │    │ Private Subnet  │  │   │
│  │   │ Subnet      │───►│ + WAF (BP2) │───►│ BP7: Auto       │  │   │
│  │   │ (NACLs)     │    │ + API GW    │    │ Scaling Group   │  │   │
│  │   └─────────────┘    └─────────────┘    └─────────────────┘  │   │
│  │                            │                                  │   │
│  │                            ▼                                  │   │
│  │                    Security Groups                            │   │
│  └──────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

BP Summary Table:

BPServiceLayerPurpose
BP1CloudFront, Global AcceleratorEdgeAbsorb DDoS at edge, reduce origin load
BP2WAFL7Filter malicious requests, rate limiting
BP3Route 53DNSDNS at edge, shuffle sharding, health checks
BP4API GatewayL7Hide backend, burst limits, API keys
BP5VPC (SG + NACL)L3/L4Filter IPs at subnet/ENI level
BP6ELBL4/L7Distribute traffic, scales automatically
BP7Auto ScalingInfraScale EC2 during traffic surges

1. Edge Location Mitigation (BP1, BP3):

Internet ──► CloudFront (BP1) ──► Origin
             │
             ├─ Caches static content (reduces origin requests)
             ├─ Absorbs L3/L4 attacks (SYN floods, UDP reflection)
             └─ Geo-blocking available

Internet ──► Global Accelerator (BP1) ──► ALB/NLB/EC2
             │
             ├─ Fixed Anycast IPs (2 IPs)
             ├─ Routes via AWS backbone (not public internet)
             ├─ Shield integration
             └─ Use when CloudFront not compatible (non-HTTP)

Internet ──► Route 53 (BP3) ──► Your resources
             │
             ├─ DNS resolution at edge
             ├─ Built-in DDoS protection
             └─ Health checks + failover

When to use which:


2. Infrastructure Layer Defense (BP1, BP3, BP6, BP7):

                    DDoS Attack
                         │
                         ▼
┌─────────────────────────────────────────┐
│            Edge Services                │
│  (absorb volumetric attacks)            │
└────────────────────┬────────────────────┘
                     │ reduced traffic
                     ▼
┌─────────────────────────────────────────┐
│         ELB (BP6) - scales auto         │
│  (distributes across instances)         │
└────────────────────┬────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────┐
│     Auto Scaling Group (BP7)            │
│  (adds instances during surge)          │
│     ┌────┐  ┌────┐  ┌────┐  ┌────┐     │
│     │EC2 │  │EC2 │  │EC2 │  │EC2 │     │
│     └────┘  └────┘  └────┘  └────┘     │
└─────────────────────────────────────────┘

Key point: ELB + Auto Scaling = absorb legitimate traffic surges AND DDoS


3. Application Layer Defense (BP1, BP2):

Malicious Request ──► CloudFront ──► WAF (BP2) ──► ALB ──► App
                          │              │
                          │              ├─ SQL injection? BLOCK
                          │              ├─ XSS? BLOCK  
                          │              ├─ Rate > 2000/5min? BLOCK IP
                          │              ├─ Bad IP reputation? BLOCK
                          │              └─ Geo = blocked country? BLOCK
                          │
                          └─ Cached? Return from edge (origin never hit)

WAF Rules for DDoS:

Shield Advanced (BP1, BP2, BP6):


4. Attack Surface Reduction (BP1, BP4, BP5, BP6):

                    Attacker
                        │
                        ▼
              ┌─────────────────┐
              │   CloudFront    │ ◄── Only this IP is public
              │   (or API GW)   │
              └────────┬────────┘
                       │
        ┌──────────────┼──────────────┐
        │              │              │
        ▼              ▼              ▼
   ❌ Can't reach  ❌ Can't reach  ❌ Can't reach
      EC2 IPs        Lambda          RDS
      directly       directly        directly

Obfuscation = hide your backend:

Security Groups + NACLs (BP5):

Amazon GuardDuty:

GuardDuty is intelligent threat discovery using ML.

Core Data Sources (always analyzed):

✅ SourceWhat It Detects
CloudTrail Management EventsUnusual API calls, create VPC, create trail
CloudTrail S3 Data EventsGet/list/delete object anomalies
VPC Flow LogsUnusual traffic, suspicious IPs
DNS LogsCompromised EC2 sending encoded DNS queries

Optional Features: EKS Audit Logs, RDS & Aurora login, EBS, Lambda, S3 Data Events

NOT a GuardDuty data source:

❌ NOT ScannedWhy it’s a trap
CloudWatch LogsCommon confusion - GuardDuty uses its own log analysis
Application logsGuardDuty = infrastructure threats, not app logs
Custom logsNot supported

⚠️ Exam trap: “GuardDuty scans CloudWatch Logs” → FALSE! GuardDuty scans CloudTrail, VPC Flow Logs, DNS Logs (not CloudWatch Logs)

Memory hook - GuardDuty sources: “CVD”

┌─────────────────┐
│ VPC Flow Logs   │──┐
├─────────────────┤  │     ┌───────────┐     ┌─────────────┐
│ CloudTrail Logs │──┼────►│ GuardDuty │────►│ EventBridge │──► SNS/Lambda
├─────────────────┤  │     └───────────┘     └─────────────┘
│ DNS Logs        │──┘
└─────────────────┘
  + Optional: S3, EBS, Lambda, RDS, EKS
  ❌ NOT: CloudWatch Logs

Amazon Inspector:

Inspector performs automated security assessments.

Scans:

TargetWhat’s ScannedRequires
EC2 instancesOS vulnerabilities, network reachabilitySSM Agent
ECR Container ImagesVulnerabilities on push-
Lambda FunctionsCode vulnerabilities, package dependencies-
Lambda ──────┐
             │
SSM Agent ───┼────► Inspector ────► Security Hub
(EC2)        │         │             EventBridge
             │         ▼
ECR Images ──┘    Findings + Risk Score

GuardDuty vs Inspector vs Macie vs Config:

ServiceWhat It DoesLooks AtUse Case
GuardDutyThreat detectionCloudTrail, VPC Flow, DNS“Is someone attacking me?”
InspectorVulnerability scanningEC2 OS, ECR images, Lambda“Do I have unpatched CVEs?”
MacieSensitive data discoveryS3 buckets“Do I have exposed PII?”
ConfigConfiguration complianceResource configs“Are my resources compliant?”

⚠️ Exam trap keywords:

Wrong answers for “OS vulnerabilities” question:

Amazon Macie:

Macie discovers and protects sensitive data using ML and pattern matching.

S3 Buckets ────► Macie ────► EventBridge ────► integrations
              (discover PII)    (notify)      (Lambda, SNS, etc.)


🎯 MASTER SUMMARY: AWS Security Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: Encryption Ownership Spectrum

AWS security is about WHO controls the keys and WHERE encryption happens.

Most AWS Control ◄──────────────────────────────────► Most Customer Control

SSE-S3          SSE-KMS           SSE-KMS (CMK)      SSE-C           Client-Side
(AWS owns)      (AWS managed)     (Customer managed)  (Customer key)  (Customer encrypts)

Key insight: The more control you want, the more responsibility you have.

Principle 2: Key Policy is King (KMS Access)

Unlike S3/Lambda, KMS requires Key Policy — IAM policy alone is NOT enough.

Why? KMS keys are highly sensitive. AWS designed it so you MUST explicitly allow access at the key level.

Derivation: If question mentions “IAM policy for KMS” → check if Key Policy allows it. No Key Policy = No Access.

Principle 3: Regional vs Global Services

Understanding where services “live” determines where certificates/keys must be.

ServiceScopeCertificate/Key Location
CloudFrontGlobal (us-east-1)ACM in us-east-1
API Gateway Edge-OptimizedUses CloudFrontACM in us-east-1
API Gateway RegionalRegionalACM in same region
KMSRegionalMust re-encrypt when crossing regions
KMS Multi-RegionMulti-regionSame key ID across regions
CloudHSMRegionalNo cross-region replication

Derivation: “Where does TLS terminate?” → that’s where cert must be.

Principle 4: Detection vs Protection vs Management

Security services fall into three categories:

CategoryServicesAction
DetectionGuardDuty, Inspector, Macie, ConfigFind problems
ProtectionWAF, Shield, Network FirewallBlock attacks
ManagementFirewall Manager, Security HubCentralize/aggregate

Derivation: “Centrally manage across accounts” → Management category → Firewall Manager

Principle 5: Layer Determines Service

Network attacks happen at different layers:

LayerAttacksProtection
L3/L4SYN floods, UDP reflectionShield, NACLs, Security Groups
L7SQL injection, XSS, DDoSWAF, API Gateway throttling

Derivation: “NLB + WAF” → IMPOSSIBLE (NLB = L4, WAF = L7)

Principle 6: Rotation ≠ Versioning

Don’t confuse these:

ConceptWhat ChangesService
Key RotationNew key material, same key IDKMS (1 year fixed)
Secret RotationNew password/credentialSecrets Manager (configurable)
Version HistoryTrack all previous valuesSSM Parameter Store

Derivation: “Rotate every 6 months” → Manual rotation with aliases (KMS auto is 1 year only)

Principle 7: Storage Service Owns Client-Side Naming

“[Service] Client-Side Encryption” means YOUR APP encrypts before sending to [Service].

Derivation: Client-side encryption question → identify the STORAGE service

Principle 8: Auto-Rotation is Rare

Most services do NOT auto-rotate:

ServiceAuto-Rotation?
Secrets Manager✅ Yes (built-in)
KMS✅ Yes (1 year only)
SSM Parameter Store❌ No
IAM Access Keys❌ No

Derivation: “DB credentials + auto rotation” → Secrets Manager (only option)


Part 2: Decision Trees (Follow Keywords → Find Answer)

Encryption Service Decision Tree

Need encryption?
│
├─► "DB credentials" + "auto rotation"
│   └─► Secrets Manager
│
├─► "Config values" + "version history"
│   └─► SSM Parameter Store
│
├─► "FIPS 140-2 Level 3" OR "AWS cannot access keys"
│   └─► CloudHSM
│
├─► "Multi-region" + "Global DB"
│   └─► KMS Multi-Region Keys (NOT CloudHSM)
│
├─► "Admins cannot see data"
│   └─► Client-Side Encryption
│
└─► Standard encryption
    └─► KMS (default choice)

Security Service Decision Tree

Security question?
│
├─► "Threat" / "attack" / "compromised" / "unusual API"
│   └─► GuardDuty
│
├─► "Vulnerability" / "CVE" / "patch" / "OS security"
│   └─► Inspector
│
├─► "PII" / "sensitive data" / "S3 data discovery"
│   └─► Macie
│
├─► "Compliance" / "configuration audit"
│   └─► Config
│
├─► "Centrally manage" / "across accounts" / "Organization"
│   └─► Firewall Manager
│
├─► "DDoS protection"
│   └─► Shield (Standard=free, Advanced=$3k/mo)
│
└─► "Layer 7" / "SQL injection" / "XSS" / "rate limiting"
    └─► WAF

ACM Certificate Location Decision Tree

Where to put ACM certificate?
│
├─► CloudFront distribution?
│   └─► us-east-1
│
├─► Edge-Optimized API Gateway?
│   └─► us-east-1 (uses CloudFront behind scenes)
│
├─► Regional API Gateway?
│   └─► Same region as API Gateway
│
└─► ALB?
    └─► Same region as ALB

The “CANNOT” List

❌ ImpossibleWhy
WAF + NLBWAF = L7, NLB = L4
ACM + EC2 directlyCan’t extract private key
KMS auto-rotate < 1 yearFixed at 1 year
Imported key auto-rotationManual only via alias
CloudHSM multi-region replicationSingle-region only
GuardDuty scan CloudWatch LogsUses CloudTrail, VPC Flow, DNS only
Single asymmetric key for encrypt + signChoose one at creation
Share AMI with AWS Managed KeyMust use Customer Managed Key

Part 3: Scenario Pattern Recognition

Pattern: “RDS credentials + automatic rotation”

Keywords: RDS, password, credentials, automatic rotation Answer: Secrets Manager Why: Only service with native RDS rotation integration


Pattern: “FIPS 140-2 Level 3 compliance”

Keywords: FIPS, Level 3, compliance, tamper-evident Answer: CloudHSM Why: KMS is Level 2 only; CloudHSM is Level 3


Pattern: “AWS should not have access to encryption keys”

Keywords: AWS cannot access, customer-managed hardware Answer: CloudHSM Why: KMS = AWS manages key material; CloudHSM = you manage entirely


Pattern: “Aurora Global + client-side encryption”

Keywords: Global database, multi-region, client-side, encrypt Answer: KMS Multi-Region Keys Why: CloudHSM can’t replicate keys across regions


Pattern: “Centrally manage Security Groups across accounts”

Keywords: centrally, manage, multiple accounts, Organization Answer: Firewall Manager Why: Only service that manages security rules across Organization


Pattern: “Edge-Optimized API Gateway + ACM certificate”

Keywords: Edge-Optimized, API Gateway, certificate, SSL Answer: us-east-1 Why: Edge-Optimized uses CloudFront → CloudFront = us-east-1


Pattern: “Notify 30 days before certificate expires”

Keywords: certificate expiry, notification, X days before Answer: Depends on cert type:


Pattern: “Fixed IP address + WAF protection”

Keywords: fixed IP, static IP, WAF, DDoS Answer: Global Accelerator + ALB + WAF Why: WAF can’t attach to NLB; Global Accelerator provides fixed IPs to ALB


Pattern: “OS vulnerabilities on EC2 instances”

Keywords: vulnerability, CVE, patch, EC2, OS Answer: Inspector (with SSM Agent) Why: Inspector scans for CVEs; GuardDuty detects threats, not vulnerabilities


Pattern: “Track configuration changes over time”

Keywords: configuration, history, version, changes Answer: SSM Parameter Store (for config values) or Config (for resources) Why: Built-in versioning for every change


Pattern: “Sensitive data discovery in S3”

Keywords: PII, sensitive data, S3, discover Answer: Macie Why: ML-based PII discovery specifically for S3


Pattern: “Unusual API calls detected”

Keywords: unusual API, suspicious activity, threat, compromise Answer: GuardDuty Why: Analyzes CloudTrail for anomalous API patterns


Pattern: “Protect against SQL injection”

Keywords: SQL injection, XSS, Layer 7, web exploits Answer: WAF Why: WAF has managed rules for common web attacks


Pattern: “DDoS protection with 24/7 support team”

Keywords: DDoS, response team, SRT, cost protection Answer: Shield Advanced Why: Shield Advanced includes DDoS Response Team access


Pattern: “Rotate KMS key every 6 months”

Keywords: rotate, 6 months, 90 days, custom period Answer: Manual rotation with Key Alias Why: Auto-rotation is fixed at 1 year; manual rotation for custom periods


Part 4: Quick Reference Tables

Security Services Comparison

ServiceDetectsScansOutput
GuardDutyThreats, attacksCloudTrail, VPC Flow, DNSEventBridge
InspectorVulnerabilitiesEC2 OS, ECR, LambdaSecurity Hub
MacieSensitive dataS3 bucketsEventBridge
ConfigNon-complianceResource configsSNS, S3

Secrets/Config Storage Comparison

RequirementService
DB credentials + auto rotationSecrets Manager
Config values + versioningSSM Parameter Store
Hierarchical config pathsSSM Parameter Store
Multi-region secretsSecrets Manager
Free tier neededSSM Parameter Store

KMS Key Types

Key TypeAuto-RotationPeriodManual Rotation
AWS ManagedAlways ON1 yearN/A
Customer ManagedOptional1 yearVia alias
Imported❌ NeverN/AVia alias only

WAF Compatibility

✅ Works❌ Doesn’t Work
ALBNLB
CloudFrontCLB
API GatewayEC2 directly
AppSyncRoute 53
Cognito User Pool

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“RDS” + “auto rotation”Secrets Manager
“FIPS 140-2 Level 3”CloudHSM
“AWS cannot access keys”CloudHSM
“Multi-region” + “encryption” + “Global DB”KMS Multi-Region Keys
“Edge-Optimized” + “certificate”us-east-1
“Regional API Gateway” + “certificate”Same region as API
“CloudFront” + “certificate”us-east-1
“Centrally manage” + “accounts”Firewall Manager
“Security Groups” + “Organization”Firewall Manager
“WAF” + “multiple accounts”Firewall Manager
“OS vulnerability” / “CVE”Inspector
“Threat” / “unusual API” / “compromised”GuardDuty
“PII” / “sensitive data” + “S3”Macie
“Configuration compliance”Config
“Fixed IP” + “WAF”Global Accelerator + ALB + WAF
“NLB” + “WAF”IMPOSSIBLE
“Rotate every 6 months”Manual rotation with alias
“Asymmetric” + “encrypt AND sign”Two separate keys needed
“Version history” + “config”SSM Parameter Store
“Certificate expiry notification”Imported = Config rule; ACM-created = EventBridge/CW
“DDoS” + “response team”Shield Advanced
“DDoS” + “free”Shield Standard
“SQL injection” / “XSS”WAF
“Layer 7 protection”WAF
“Layer 3/4 protection”Shield, Security Groups, NACLs
“Cross-account encrypted AMI”Customer Managed Key + Key Policy
“Admins cannot see data”Client-Side Encryption
“Lambda Client-side Encryption”WRONG ANSWER (Lambda isn’t storage)
“KMS IAM Policy” aloneNOT enough (Key Policy required)
“ACM” + “EC2 directly”IMPOSSIBLE
“GuardDuty” + “CloudWatch Logs”FALSE (not a data source)

Part 6: Elimination Checklist

For Encryption Questions:

□ Does it mention "auto rotation" for DB?
  → Yes = Secrets Manager
  → No = continue

□ Does it mention "FIPS Level 3" or "AWS cannot access"?
  → Yes = CloudHSM
  → No = continue

□ Does it mention "multi-region" + "Global database"?
  → Yes = KMS Multi-Region Keys
  → No = continue

□ Does it mention "config values" or "version history"?
  → Yes = SSM Parameter Store
  → No = probably KMS

For Security Service Questions:

□ Is it about DETECTION (finding problems)?
  → Threats/attacks = GuardDuty
  → Vulnerabilities/CVE = Inspector
  → Sensitive data = Macie
  → Configuration = Config

□ Is it about PROTECTION (blocking attacks)?
  → Layer 7 (HTTP) = WAF
  → Layer 3/4 (network) = Shield, SG, NACL

□ Is it about MANAGEMENT (centralize/aggregate)?
  → Across accounts = Firewall Manager
  → Aggregate findings = Security Hub

For Certificate Location Questions:

□ Is CloudFront involved (directly or Edge-Optimized)?
  → Yes = us-east-1
  → No = same region as the service

🏆 The Golden Rules

  1. KMS Key Policy is mandatory (IAM alone never works for KMS)
  2. KMS rotation = 1 year fixed (use manual rotation + alias for custom periods)
  3. Edge-Optimized = us-east-1 (because CloudFront)
  4. WAF = Layer 7 only (can’t attach to NLB)
  5. CloudHSM = single-region (no multi-region replication)
  6. Secrets Manager = auto-rotation (SSM Parameter Store doesn’t have it)
  7. GuardDuty sources = CVD (CloudTrail, VPC Flow, DNS - NOT CloudWatch!)
  8. “Centrally manage across accounts” = Firewall Manager (it’s in the name)
  9. Inspector = vulnerabilities (GuardDuty = threats - different!)
  10. Client-side encryption = storage service name (Lambda Client-side = nonsense)
  11. Cross-account AMI = Customer Managed Key (AWS Managed Key can’t be shared)
  12. Asymmetric key = encrypt OR sign (never both with same key)
  13. ACM + EC2 directly = impossible (can’t extract private key)
  14. Certificate expiry alerts — imported = Config rule + SNS; ACM-created = EventBridge/CW DaysToExpiry
  15. Multi-region encryption = KMS Multi-Region Keys (CloudHSM can’t do it)

Deployment (IaaS) and software development (CI/CD):

AWS Cloudformation is a declarative way of outlining and creating your AWS Infrastructure, for any resources, in the right order and with exact configuration that you specify.

CloudFormation Service Role:

User (cloudformation:*, iam:PassRole) ──► CloudFormation ──► Service Role (s3:*, ec2:*) ──► Resources

AWS Infrastructure Composer (formerly Application Composer): visually design and build serverless applications quickly on AWS. Deploy AWS infrastructure code without needing to be an expert in AWS. Configure how your resources interact with each other. Ability to import existing CloudFormation / SAM templates to visualize them. Help to visualize, build, and deploy modern applications from all AWS services that are supported by AWS CloudFormation.

AWS Cloud Development Kit (CDK) accelerates cloud development using common programming languages to model your applications, to deplay infrastructure and applicationg runtime code together.

AWS Elastic Beanstalk is a managed service of Platform as a Service (PaaS), developer centric view of deploying an application on AWS (using EC2, ASG, ELB, RDS and etc). Instance configuration, OS handling, deployment strategy, capacity provisioning, load balancing and auto-scaling, application health-monitoring & responsiveness, everything except the actual application code is responsibility of AWS Elastic Beanstalk.
Elastic Beanstalk automatically handles capacity provisioning, load balancing, autoscaling and application health monitoring.

AWS CodeDeploy is a fully managed deployment service that automates software deployments to various compute services, such as Amazon Elastic Compute Cloud (EC2), Amazon Elastic Container Service (ECS), AWS Lambda, and your on-premises servers. Use CodeDeploy to automate software deployments, eliminating the need for error-prone manual operations.

AWS CodeCommit is a fully managed, scalable and highly available code repository, using Git technology. Collaborate with others on code. Code changes are automatically versioned.

AWS CodeBuild is a fully managed, serverless, scalable & highly availble code building service in the cloud. Compiles source code, run tests and produces packages that are ready to be deployed.

AWS CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. Compatible with CodeCommit, CodeBuild, CodeDeploy, Elastic Beanstalk, CloudFormation, GitHub, etc.

AWS CodeArtifact is a secure, scalable, and cost-effective package management for software development.

AWS CodeStar is a unified UI to easily manage software develompent activities in one place.

AWS Cloud9 is a cloud IDE (Intergrated Development Environment) for writing, running and debugging code.

AWS Step Functions: serverless visual workflow to archestrate Labmda functions. Sequence, parallel, conditions, timeouts, error handling, human approval feature etc. Integrates with EC2, ECS, On-premises servers, API Gatewat, SQS queues, etc.
AWS Step Functions excel in complex workflow orchestration scenarios, offering advanced features such as state management, error handling, and parallel execution.

AWS Amplify: a set of tools and services that helps you develop and deploy scalable full stack web and mobile applications. Authentication, Storage, API (REST, GraphQL), CI/CD, PubSub, Analytics, AI/ML Predictions, Monitoring, Source Code from AWS, GitHub, etc.
Amplify has serverless architecture simplifies maintenance and scales automatically. There is no need to provision or manage EC2 instances. Lambda and API Gateway handle availability and response to traffic spikes automatically. Upload code and let Amplify handle deploying and running it.

AWS Device Farm: fully-managed service that tests your web and mobile apps against desktop browsers, real mobile devices and tablets. Run tests concurrently on multiple devices. Ability to configure device ettings (GPS, language, WiFi, Bluetooth, etc).

AWS Systems Manager (SSM) — hybrid AWS service to manage infrastructure at scale (EC2 + on-premises servers). Requires SSM Agent installed on managed instances.

SSM Session Manager:

SSM Run Command:

SSM Patch Manager:

SSM Maintenance Windows:

Maintenance Windows ── trigger (e.g., every 24h) ──► Run Command ──► EC2 (with SSM Agent)

SSM Automation:

⚠️ Exam trap: “Secure shell access without SSH/port 22” → SSM Session Manager. “Run script on 100s of instances” → SSM Run Command. “Automate patching schedule” → SSM Patch Manager + Maintenance Windows.



🎯 MASTER SUMMARY: Deployment, IaC & CI/CD Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: IaC Tools Differ by Abstraction Level

Principle 2: CI/CD Pipeline = Commit → Build → Deploy

AWS CI/CD chain: CodeCommit (source) → CodeBuild (build/test) → CodeDeploy (deploy) → orchestrated by CodePipeline. Each service is independent and can integrate with third-party tools.

Principle 3: CloudFormation Service Role = Least Privilege

Users don’t need permissions to underlying resources. They only need cloudformation:* + iam:PassRole. The Service Role has the resource permissions. This enables delegation without over-granting.

Principle 4: SSM = Manage Instances at Scale Without SSH

SSM Agent enables secure management of EC2 + on-prem servers. Five key features map to five different exam patterns:

Principle 5: CodeDeploy = Multi-Target Deployment

CodeDeploy works on EC2, ECS, Lambda, AND on-premises — it’s the deployment tool that bridges cloud and on-prem.


Part 2: Decision Trees

IaC Tool Decision Tree

How do you want to define infrastructure?
│
├─ "YAML/JSON templates, full control" → CloudFormation
├─ "Programming language (Python/TS)" → CDK
├─ "Just upload my code, handle the rest" → Elastic Beanstalk
├─ "Full-stack web/mobile app" → Amplify
└─ "Visual drag-and-drop designer" → Infrastructure Composer

CI/CD Service Decision Tree

What CI/CD step do you need?
│
├─ SOURCE (store code) → CodeCommit (or GitHub)
├─ BUILD (compile/test) → CodeBuild
├─ DEPLOY (push to infra) → CodeDeploy
├─ ORCHESTRATE (chain all) → CodePipeline
├─ PACKAGE MGMT → CodeArtifact
└─ UNIFIED UI → CodeStar

SSM Feature Decision Tree

What do you need to do on EC2/on-prem?
│
├─ "Shell access without SSH" → Session Manager
├─ "Run script on many instances" → Run Command
├─ "Automate OS patching" → Patch Manager
├─ "Schedule maintenance tasks" → Maintenance Windows
├─ "Complex task automation / Config fix" → Automation (Runbooks)

Part 3: Scenario Pattern Recognition

Pattern: “Deploy infrastructure as code with full AWS resource control”

Keywords: IaC, template, declarative, all resources Answer: CloudFormation Why: Native AWS IaC, supports almost all resources, custom resources for unsupported.


Pattern: “Let users deploy stacks without giving them resource permissions”

Keywords: least privilege, deploy stacks, iam:PassRole Answer: CloudFormation Service Role Why: Service Role has resource permissions; user only needs cloudformation:* + iam:PassRole.


Pattern: “Just upload code, AWS handles everything else”

Keywords: PaaS, developer-centric, auto-scaling, health monitoring Answer: Elastic Beanstalk Why: Handles capacity, load balancing, scaling, monitoring. You only write code.


Pattern: “Define infrastructure using Python/TypeScript”

Keywords: programming language, CDK, familiar syntax Answer: AWS CDK Why: CDK compiles to CloudFormation. Use familiar languages instead of YAML/JSON.


Pattern: “Automate release pipeline: source → build → deploy”

Keywords: CI/CD, pipeline, automate releases Answer: CodePipeline (orchestrates CodeCommit + CodeBuild + CodeDeploy)


Pattern: “Deploy to EC2, ECS, Lambda, AND on-premises”

Keywords: deploy, multi-target, on-premises Answer: CodeDeploy Why: Only AWS deployment service that supports both cloud and on-prem.


Pattern: “Secure shell to EC2 without SSH keys or port 22”

Keywords: no SSH, no bastion, no port 22, secure shell Answer: SSM Session Manager Why: Uses SSM Agent + IAM permissions. Logs to S3/CloudWatch.


Pattern: “Run a script across 500 EC2 instances”

Keywords: fleet, multiple instances, run command, no SSH Answer: SSM Run Command


Pattern: “Schedule OS patching every Sunday at 2 AM”

Keywords: patch, schedule, compliance, OS updates Answer: SSM Patch Manager + Maintenance Windows


Pattern: “Auto-remediate non-compliant AWS Config rules”

Keywords: Config, remediate, auto-fix, non-compliant Answer: AWS Config + SSM Automation (Runbooks)


Pattern: “Full-stack web/mobile app with serverless backend”

Keywords: web app, mobile app, full-stack, serverless, Amplify Answer: AWS Amplify


Pattern: “Test mobile app on real devices”

Keywords: mobile testing, real devices, browsers Answer: AWS Device Farm


Part 4: Quick Reference Tables

IaC & Deployment Services

ServiceWhat It DoesAbstraction
CloudFormationIaC templates → AWS resourcesLow (full control)
CDKCode → CloudFormation templatesMedium
BeanstalkUpload code → full environmentHigh (PaaS)
AmplifyFull-stack web/mobile frameworkHigh (serverless)
Infrastructure ComposerVisual CloudFormation designerVisual

CI/CD Pipeline Services

ServiceRoleServerless?
CodeCommitSource repository (Git)
CodeBuildBuild + test
CodeDeployDeploy to EC2/ECS/Lambda/on-prem
CodePipelineOrchestrate pipeline
CodeArtifactPackage management

SSM Features

FeaturePurposeKey Differentiator
Session ManagerSecure shellNo SSH/port 22, IAM-based
Run CommandExecute scripts on fleetNo SSH, EventBridge trigger
Patch ManagerAutomate patchingCompliance reports
Maintenance WindowsSchedule operationsSchedule + duration + tasks
AutomationComplex task runbooksConfig remediation trigger

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“IaC, declarative templates”CloudFormation
“Custom resources for unsupported”CloudFormation
“Service Role, iam:PassRole”CloudFormation Service Role
“Visual designer for CloudFormation”Infrastructure Composer
“Define infra in Python/TypeScript”CDK
“PaaS, upload code, handles rest”Elastic Beanstalk
“Full-stack web/mobile serverless”Amplify
“Automate CI/CD pipeline”CodePipeline
“Source code repository (Git)”CodeCommit
“Build and test code”CodeBuild
“Deploy to EC2/ECS/Lambda/on-prem”CodeDeploy
“Package management”CodeArtifact
“No SSH, no port 22, secure shell”SSM Session Manager
“Run script on fleet of instances”SSM Run Command
“Automate OS patching”SSM Patch Manager
“Schedule maintenance tasks”SSM Maintenance Windows
“Config auto-remediation”SSM Automation + Config
“Workflow orchestration, state machine”Step Functions
“Test on real mobile devices”Device Farm

🏆 The Golden Rules

  1. CloudFormation = IaC king (declarative, almost all resources)
  2. CDK → CloudFormation (CDK generates CF templates, not a replacement)
  3. Beanstalk = PaaS (you code, AWS does everything else)
  4. CodePipeline = orchestrator (chains Commit → Build → Deploy)
  5. CodeDeploy = only multi-target (EC2 + ECS + Lambda + on-prem)
  6. Session Manager = no SSH (no port 22, no bastion, no keys)
  7. Run Command = fleet scripts (no SSH, resource groups, EventBridge)
  8. Patch Manager + Maintenance Windows = scheduled patching
  9. SSM Automation = Config remediation (Runbooks fix non-compliant resources)
  10. Service Role = least privilege delegation (user doesn’t need resource permissions)

Machine Learning (ML) Services:

Amazon Rekognition automates image recognition and video analysis for your applications without machine learning (ML) experience.

Rekognition Content Moderation:

⚠️ Exam trap: “Moderate user-uploaded images” or “detect inappropriate content” → Rekognition Content Moderation + optionally A2I for human review.

Amazon Transcribe automatically convert speech to text, using deep learning process called automatic speech recognition (ASR).

⚠️ Exam trap: “Remove PII from audio/transcripts” → Amazon Transcribe with PII Redaction enabled.

Amazon Polly turn text into lifelike speech using deep learning. Allows to create applications that talk.

Amazon Translate natural and accurate language translation.

Amazon Lex (technology that powers Alexa) easily add AI that understands intent, maintains context, and automates simple tasks across many languages to build chatbots, call center bots.

Amazon Connect is an omnichannel cloud contact center that helps companies provide superior customer service at a lower cost. Amazon Connect provides a seamless experience across voice and chat for customers and agents.

Amazon Comprehend fully managed and serverless service for natural language processing (NLP), that uses machine learning to find insights and relationships in text: define language of the text, extract key phrases, understands emotions in the text, etc. Create and group articles by topics that Comprehend will uncover.

Amazon Comprehend Medical:

Amazon SageMaker is a fully managed service for developers / data scientists to build ML models.

Amazon Forecast is a fully managed service that uses ML to deliver highly accurate forecasts (product demand planning, financial planning, resource planning, etc).

Amazon Kendra is a fully managed document search service powered by ML.

Kendra Architecture:
Data Sources ──► indexing ──► Knowledge Index ──► "Where is IT support?" ──► "1st floor"
(S3, RDS, etc)               (powered by ML)          (natural language)        (answer)

Amazon Personalize is a fully managed ML-service to build apps with real-time personalized recommendations.

Personalize Architecture:
S3 (historical data) ─────────┐
                              ├──► Amazon Personalize ──► Websites, Mobile Apps, SMS, Emails
Personalize API (real-time) ──┘    (customized API)

Amazon Textract automatically extracts text, handwriting and data from any scanned documents using AI and ML.

Textract Flow:
Document (ID, form, etc) ──► analyze ──► Amazon Textract ──► Structured JSON
                                                              {"Document ID": "123",
                                                               "Name": "...",
                                                               "DOB": "23.05.1997"}

Lex + Connect Integration (Call Center Pattern):

Phone Call ──► Connect ──► stream ──► Lex ──► invoke ──► Lambda ──► CRM
(schedule      (contact    (audio)   (intent              (action)   (database)
appointment)   center)              recognized)

AWS Machine Learning - Quick Reference

ServicePurposeKey Feature
RekognitionImage/video analysisFace detection, content moderation
TranscribeSpeech → TextPII redaction, multi-language
PollyText → SpeechLexicons, SSML
TranslateLanguage translationLocalization
LexChatbotsASR + NLU (powers Alexa)
ConnectContact center80% cheaper, cloud-based
ComprehendNLP text analysisSentiment, topics, entities
Comprehend MedicalClinical text NLPPHI detection
SageMakerBuild custom ML modelsFull ML workflow
ForecastTime-series predictionsDemand/resource planning
KendraDocument searchNatural language, incremental learning
PersonalizeRecommendationsSame as Amazon.com
TextractDocument data extractionForms, tables, handwriting
BedrockGenerative AI (Foundation Models)Claude, Llama, Titan, Stable Diffusion

Amazon Bedrock is a fully managed service for building generative AI applications using foundation models (FMs).

⚠️ Exam trap: “Generative AI” or “Foundation Models” or “LLM on AWS” → Amazon Bedrock. SageMaker = build your own ML models.

Amazon Augmented AI (A2I) provides human review workflows for ML predictions.

⚠️ Exam trap: “Human review of ML predictions” or “manual review when confidence low” → Amazon A2I


ML Service Decision Tree

What do you need to do?
         │
    ┌────┴────┬─────────┬──────────┬──────────┬────────────┐
    ▼         ▼         ▼          ▼          ▼            ▼
 VISION    SPEECH    TEXT/NLP   SEARCH    PREDICT     GEN AI
    │         │         │          │          │            │
    ▼         ▼         ▼          ▼          ▼            ▼
Rekognition  ┌─┴─┐   ┌──┴──┐    ┌──┴──┐   ┌──┴───┐     Bedrock
             │   │   │     │    │     │   │      │
          Speech→ Text→  Comprehend Kendra Forecast  (Foundation
          Text   Speech  (NLP)    (docs) (time)     Models)
             │      │       │                │
             ▼      ▼       ▼                ▼
         Transcribe Polly  Medical?     Personalize
                              │         (recommend)
                              ▼
                         Comprehend
                          Medical

When to use which search service?

Search Type?
     │
     ├── Document Q&A ("Where is IT support?") ──► Kendra
     │   (natural language answers)
     │
     └── Full-text search (logs, partial match) ──► OpenSearch
         (search engine, analytics)

When to use which text extraction?

Extract from documents?
     │
     ├── Forms, tables, structured data ──► Textract
     │   (invoices, IDs, medical records)
     │
     └── Text in images/videos ──► Rekognition
         (signs, banners, license plates)

Custom ML vs Managed Services?

Need ML capability?
     │
     ├── Pre-built solution exists? ──► Use managed service
     │   (Rekognition, Comprehend, Forecast, etc.)
     │
     └── Need custom model? ──► SageMaker
         (your own algorithms, data)

Additional Exam Traps

⚠️ Exam trap: Kendra vs OpenSearch:

⚠️ Exam trap: Textract vs Rekognition text:

⚠️ Exam trap: SageMaker vs Managed Services:

⚠️ Exam trap: Bedrock vs SageMaker:

⚠️ Exam trap: Comprehend vs Comprehend Medical:



🎯 MASTER SUMMARY: Machine Learning Exam Guide

Part 1: Core Principles

Principle 1: Managed ML vs Custom ML

AWS offers two paths:

Rule: If a managed service exists for your use case → use it. Custom ML only when needed.

Principle 2: Each Service Has ONE Primary Purpose

PurposeService
Image/Video analysisRekognition
Speech → TextTranscribe
Text → SpeechPolly
TranslationTranslate
ChatbotsLex
Contact CenterConnect
Text NLPComprehend
Document Q&AKendra
RecommendationsPersonalize
Document extractionTextract
Time-series forecastForecast
Custom MLSageMaker
Generative AIBedrock

Principle 3: Integration Patterns

Common AWS ML patterns:

Principle 4: Human-in-the-Loop = A2I

When ML confidence is low, route to human review:


Part 2: Instant-Answer Table

Question Contains→ Instant Answer
“image recognition”Rekognition
“video analysis”Rekognition
“face detection”Rekognition
“content moderation” + imagesRekognition
“celebrity recognition”Rekognition
“speech to text”Transcribe
“transcribe calls”Transcribe
“remove PII from audio”Transcribe (Redaction)
“closed captioning”Transcribe
“text to speech”Polly
“applications that talk”Polly
“translate languages”Translate
“localize content”Translate
“chatbot”Lex
“conversational bot”Lex
“powers Alexa”Lex
“call center”Connect
“contact center”Connect
“80% cheaper contact”Connect
“sentiment analysis”Comprehend
“NLP” + “text insights”Comprehend
“clinical text” + “PHI”Comprehend Medical
“physician notes”Comprehend Medical
“document search” + “Q&A”Kendra
“natural language search”Kendra
“incremental learning”Kendra
“product recommendations”Personalize
“same as Amazon.com”Personalize
“personalized marketing”Personalize
“extract from forms/tables”Textract
“invoice processing”Textract
“ID documents”Textract
“handwriting extraction”Textract
“demand forecasting”Forecast
“time-series prediction”Forecast
“build custom ML model”SageMaker
“train ML model”SageMaker
“generative AI”Bedrock
“foundation models”Bedrock
“LLM on AWS”Bedrock
“Claude/Llama/Titan”Bedrock
“human review ML”A2I
“manual review when low confidence”A2I

Part 3: The “CANNOT” / Common Confusions

ConfusionClarification
Kendra vs OpenSearchKendra = document Q&A; OpenSearch = full-text search/logs
Textract vs RekognitionTextract = forms/tables; Rekognition = text in images
SageMaker vs BedrockSageMaker = custom models; Bedrock = use foundation models
Comprehend vs MedicalComprehend = general; Medical = clinical/PHI
Polly vs TranscribePolly = text→speech; Transcribe = speech→text
Lex vs ConnectLex = chatbot logic; Connect = phone/contact center

🏆 The Golden Rules

  1. Rekognition = images/videos (faces, objects, content moderation)
  2. Transcribe = speech→text (PII redaction, subtitles)
  3. Polly = text→speech (Lexicons, SSML)
  4. Lex = chatbots (powers Alexa, ASR+NLU)
  5. Connect = contact center (80% cheaper)
  6. Comprehend = text NLP (sentiment, entities, topics)
  7. Comprehend Medical = clinical text (PHI, HIPAA)
  8. Kendra = document Q&A (natural language, incremental learning)
  9. Personalize = recommendations (same as Amazon.com)
  10. Textract = document extraction (forms, tables, IDs)
  11. Forecast = time-series (demand planning)
  12. SageMaker = custom ML (build/train/deploy your own)
  13. Bedrock = generative AI (foundation models: Claude, Llama, Titan)
  14. A2I = human review (low confidence → manual review)
  15. Managed service first (only SageMaker when no pre-built option)

Other AWS Services:

Amazon WorkSpaces: managed Desktop as a Service (DaaS) solution to easily provision Windows or Linux desktops. Cloud alternative to managing of on-premise Virtual Desktop Infrastructure (VDI). Scalable to thousands. Integrates with KMS. Pay-as-you-go pricing.

Amazon AppStream 2.0: desktop application streaming service. The application is delivered from within a web browser. Can be configured instance type per application type (CPU, RAM, GPU).

AWS IoT Core: serverless, secure & scalable to billions messages, service that allows easily connect IoT devices to AWS Cloud.

AWS AppSync: store and sync data across mobile and web apps in real-time. Makes use of GraphQL (mobile technology from Facebook). Intergrations with DynamoDB / Lambda.

AWS Ground Station: is a fully managed service that lets you ontrol satellite communications, process data and scale your satellite operations (weather forecasting, surface imaging, videobroadcasting, etc). Provides global network of satellite ground stations nea AWS regions. Allows to download satellite data to AWS VPC within seconds and send it to S3 or EC2 Instances.

Amazon Pinpoint: scalable two-way (outbound/inbound) marketing communications service. Supports email, SMS, push, voice and in-app messaging. Ability to segment and personalize messages with right content to customers. Possibility to receive replies. Scales to billions of messages per day. Use cases: run campaigns by sending marketing, bulk, transactional SMS messages. Stream events (TEXT_SUCCESS, TEXT_DELIVERED) → SNS, Kinesis Data Firehose, CloudWatch Logs. Versus Amazon SNS or Amazon SES: In SNS & SES you manage each message’s audience, content, and delivery schedule. In Pinpoint, you create message templates, delivery schedules, highly-targeted segments, and full campaigns.

Amazon Simple Email Service (SES): fully managed service to send emails securely, globally, and at scale. Allows inbound/outbound emails. Reputation dashboard, performance insights, anti-spam feedback. Statistics: email deliveries, bounces, feedback loop results, email open rates. Supports DKIM and SPF. Flexible IP deployment: shared, dedicated, customer-owned. Send via AWS Console, APIs, or SMTP. Use cases: transactional, marketing, and bulk email communications.

Amazon AppFlow: fully managed integration service to securely transfer data between SaaS applications and AWS. Sources: Salesforce, SAP, Zendesk, Slack, ServiceNow. Destinations: S3, Redshift, or non-AWS (Snowflake, Salesforce). Frequency: schedule, event-driven, or on-demand. Data transformation: filtering and validation. Encrypted over public internet or privately over AWS PrivateLink.

Instance Scheduler on AWS: AWS solution (deployed via CloudFormation, not a service) to automatically start/stop AWS services to reduce costs (up to 70%).

AWS Marketplace digital catalog with thousands of software listings from independent software vendors (third-party).

AWS Data Exchange: find, subscribe to, and use third-party data in the cloud. Data providers publish data products → subscribers consume via S3, API, or Lake Formation. Use cases: financial data, weather, healthcare. No need to build custom ETL pipelines for external data.

AWS Data Pipeline: managed ETL service to process and move data between AWS compute and storage services, and on-premises sources. Defines data-driven workflows (dependencies). Runs on EC2 or EMR. Retries on failure. Legacy — prefer AWS Glue or Step Functions for new workloads.

⚠️ Exam trap: “Data Pipeline” on exam is usually legacy — modern answer is Glue (serverless ETL) or Step Functions (orchestration). But know Data Pipeline exists.

AWS Proton: fully managed delivery service for container and serverless applications. Platform teams create templates → developers deploy using self-service. Manages infrastructure provisioning + CI/CD. Think “Service Catalog for containers/serverless.”

AWS Wavelength: deploy AWS compute/storage at 5G telecom edge locations. Ultra-low latency for mobile devices. Extends VPC to Wavelength Zones. Use cases: real-time gaming, ML inference at edge, AR/VR, connected vehicles.

Amazon ECS Anywhere / EKS Anywhere: run ECS or EKS on on-premises or customer-managed infrastructure.

⚠️ Exam trap: “Run containers on-premises but manage from AWS” → ECS/EKS Anywhere. “Fully self-managed Kubernetes, same as EKS” → EKS Distro.

Amazon Elastic Transcoder: transcode media files (video/audio) stored in S3 into formats needed by consumer devices (phones, tablets, PCs). Pay per transcoding minute. Being replaced by AWS Elemental MediaConvert (more features, same purpose).

AWS License Manager: manage software licenses from vendors (Microsoft, SAP, Oracle). Track license usage, set rules, enforce limits. Integrates with EC2, RDS. Prevent license violations. Shared via AWS RAM across accounts.

Amazon Managed Grafana: fully managed Grafana for operational dashboards and observability. Queries from CloudWatch, Prometheus, X-Ray, Elasticsearch, Timestream. Workspace-based, integrates with IAM Identity Center for access.

Amazon Managed Service for Prometheus: fully managed, serverless Prometheus-compatible monitoring for containers (EKS, ECS). Stores metrics at scale. Query with PromQL. Pairs with Managed Grafana for visualization.

⚠️ Exam trap: “Container monitoring with Prometheus” → Managed Prometheus (metrics) + Managed Grafana (dashboards). NOT CloudWatch Container Insights (different approach).

AWS Audit Manager: continuously audit AWS usage to assess risk and compliance. Maps to frameworks (GDPR, HIPAA, SOC 2, PCI DSS). Collects evidence automatically from CloudTrail, Config, Security Hub. Generates audit-ready reports.

⚠️ Exam trap: “Continuous compliance auditing with evidence collection” → Audit Manager. “Compliance documents/agreements” → AWS Artifact. Different purposes.

Amazon Fraud Detector: fully managed service to identify potentially fraudulent online activities (online payment fraud, fake account creation, etc). Uses ML models trained on your data + Amazon’s fraud detection expertise. No ML experience needed.

AWS Serverless Application Repository: managed repository to deploy and publish serverless applications. Find pre-built Lambda functions and SAM templates. Supports public and private sharing.

Amazon Kinesis Video Streams: securely stream video from devices to AWS for analytics, ML, playback. Use cases: smart home cameras, industrial monitoring, computer vision with Rekognition.



🎯 MASTER SUMMARY: Other AWS Services Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: “Managed Desktop/App” = WorkSpaces vs AppStream

Principle 2: Communication Services = Pinpoint vs SNS vs SES

The key distinction is WHO manages the campaign logic:

Principle 3: Integration Services Fill SaaS ↔ AWS Gaps

Principle 4: Instance Scheduler = Solution, Not a Service

Deployed via CloudFormation. Uses DynamoDB + Lambda + tags. Supports cross-account/cross-region. Key for cost optimization questions.


Part 2: Instant-Answer Table

Question Contains→ Instant Answer
“managed virtual desktop”WorkSpaces
“DaaS, VDI replacement”WorkSpaces
“stream desktop application via browser”AppStream 2.0
“IoT devices to cloud”IoT Core
“GraphQL, real-time sync”AppSync
“satellite communications”Ground Station
“marketing campaigns, segments”Pinpoint
“two-way SMS/email campaigns”Pinpoint
“transactional email, DKIM, SPF”SES
“bulk email at scale”SES
“transfer data from Salesforce/SAP to S3”AppFlow
“SaaS integration”AppFlow
“PrivateLink for SaaS data transfer”AppFlow
“stop/start EC2 to save costs”Instance Scheduler
“schedule EC2 on/off business hours”Instance Scheduler
“third-party software catalog”Marketplace
“subscribe to third-party data”Data Exchange
“ETL pipeline, legacy orchestration”Data Pipeline (prefer Glue/Step Functions)
“platform team templates for containers”Proton
“5G edge, ultra-low latency mobile”Wavelength
“run ECS/EKS on-premises”ECS/EKS Anywhere
“self-managed Kubernetes like EKS”EKS Distro
“transcode video in S3”Elastic Transcoder / MediaConvert
“manage software licenses”License Manager
“Grafana dashboards, observability”Managed Grafana
“Prometheus container metrics”Managed Prometheus
“continuous compliance audit”Audit Manager
“detect online fraud with ML”Fraud Detector
“pre-built Lambda/SAM templates”Serverless App Repository
“stream video from devices”Kinesis Video Streams

Part 3: Common Confusions

ConfusionClarification
WorkSpaces vs AppStreamWorkSpaces = full desktop; AppStream = one app in browser
Pinpoint vs SNSPinpoint = campaigns/segments/templates; SNS = per-message notifications
Pinpoint vs SESPinpoint = marketing campaigns; SES = transactional/bulk email
AppFlow vs GlueAppFlow = SaaS sources; Glue = AWS data sources (S3, RDS, etc.)
AppSync vs API GatewayAppSync = GraphQL + real-time; API Gateway = REST/HTTP/WebSocket
Instance Scheduler vs ASGScheduler = stop/start on schedule; ASG = scale based on demand
Data Pipeline vs GlueData Pipeline = legacy ETL (EC2/EMR); Glue = modern serverless ETL
Audit Manager vs ArtifactAudit Manager = continuous audit with evidence; Artifact = download compliance docs
Proton vs Service CatalogProton = container/serverless templates; Service Catalog = any CloudFormation product
Managed Prometheus vs CloudWatchPrometheus = PromQL, container-native; CloudWatch = AWS-native metrics
EKS Anywhere vs EKS DistroAnywhere = AWS-managed on your infra; Distro = fully self-managed
Elastic Transcoder vs MediaConvertTranscoder = legacy; MediaConvert = modern replacement (more features)

🏆 The Golden Rules

  1. WorkSpaces = full desktop (DaaS, Windows/Linux, KMS)
  2. AppStream = app streaming (browser-based, no desktop)
  3. Pinpoint = marketing campaigns (segments, templates, schedules)
  4. SES = email service (DKIM, SPF, transactional)
  5. SNS = notifications (pub/sub, no campaign logic)
  6. AppFlow = SaaS ↔ AWS (Salesforce, SAP → S3, Redshift)
  7. AppSync = GraphQL (real-time mobile/web, DynamoDB)
  8. Instance Scheduler = CloudFormation solution (DynamoDB + Lambda + tags)
  9. IoT Core = billions of IoT messages (serverless, secure)
  10. Ground Station = satellites (download to VPC, S3, EC2)
  11. Wavelength = 5G edge (ultra-low latency for mobile devices)
  12. Audit Manager = continuous compliance audit (evidence collection, frameworks)
  13. Proton = container/serverless templates (platform teams → developers)
  14. Data Pipeline = legacy (prefer Glue for ETL, Step Functions for orchestration)
  15. Managed Grafana + Prometheus = container observability stack (PromQL metrics + dashboards)

Backup and Restore:

AWS Backup: fully-managed service to centrally manage and automate backups across AWS services. On-demand and scheduled backups. Supports PITR (Point-in-time Recovery). Retention Periods, Lifecycle Management, Backup Policies. Cross-Region Backup. Cross-Account backup (using AWS Organization).

Supported services: EC2, EBS, S3, RDS (all engines), Aurora, DynamoDB, DocumentDB, Neptune, EFS, FSx (Lustre & Windows), Storage Gateway (Volume Gateway)

Backup Plans:

AWS Backup Vault Lock:

⚠️ Exam trap: “Prevent anyone including root from deleting backups” → Backup Vault Lock (WORM). Similar to S3 Object Lock but for AWS Backup.

AWS DataSync: move large amount of data from on-premises to AWS (or between AWS storage services).

⚠️ Exam trap: DataSync = data movement/sync (on-prem ↔ AWS, AWS ↔ AWS). AWS Backup = backup automation across AWS services. DataSync moves files; Backup creates snapshots/backups.

AWS Elastic Disaster Recovery (DRS): quickly and easily recover physical, virtual, and cloud-based servers into AWS.

⚠️ Exam trap: “Continuous replication of servers for DR” → DRS (Elastic Disaster Recovery). “Lift-and-shift migration” → MGN. Both use agents + continuous replication, but DRS = DR (failover/failback), MGN = one-time migration.

AWS Fault Injection Simulator (FIS) — fully managed service for Chaos Engineering on AWS workloads.

⚠️ Exam trap: “Test resilience by randomly terminating instances” → FIS. “Netflix Simian Army” → inspiration for FIS but not an AWS service.


Disaster Recovery Overview:

Disaster = any event that negatively impacts business continuity or finances. Disaster Recovery (DR) = preparing for and recovering from a disaster.

DR scenarios:


RPO and RTO:

RPO (Recovery Point Objective) — how much data loss you can tolerate (time between last backup and disaster). RTO (Recovery Time Objective) — how much downtime you can tolerate (time between disaster and recovery).

◄─── Data loss ───►◄─── Downtime ──►
                   │
    ●              ⚡              ●
   RPO          Disaster          RTO
(last backup)                  (back online)

⚠️ Exam trap: RPO = data loss (backward-looking). RTO = downtime (forward-looking). Don’t confuse them — “minimize data loss” → optimize RPO. “Minimize downtime” → optimize RTO.


Disaster Recovery Strategies:

Four strategies, ordered from slowest/cheapest to fastest/most expensive:

Slower RTO ◄─────────────────────────────────────► Faster RTO
Cheaper                                            Expensive

 ┌──────────┬──────────┬──────────┬──────────┐
 │ Backup & │  Pilot   │  Warm    │ Multi    │
 │ Restore  │  Light   │ Standby  │ Site     │
 └──────────┴──────────┴──────────┴──────────┘
   Hours        10s min     Minutes    Seconds

1. Backup & Restore (High RPO/RTO, cheapest)

On-prem ──► Storage Gateway / Snowball ──► S3 ──► Glacier (lifecycle)
AWS:  EBS / RDS / Redshift ──► Scheduled Snapshots
Recovery: Snapshots ──► AMI ──► EC2 + RDS restore

2. Pilot Light (Faster than backup)

On-prem (active)          AWS Cloud
┌────────────┐            ┌────────────────────┐
│ App Server │            │ EC2 (NOT running)  │
│ Primary DB │──repl──►   │ RDS (running)      │
└────────────┘            └────────────────────┘
                          Route 53 (failover)

3. Warm Standby (Minutes RTO)

On-prem (active)          AWS Cloud
┌────────────┐            ┌──────────────────────┐
│ App Server │            │ EC2 ASG (minimum)    │
│ Primary DB │──repl──►   │ RDS Secondary        │
└────────────┘            └──────────────────────┘
                          Route 53 → scale up on failover

4. Multi Site / Hot Site (Seconds RTO, most expensive)

On-prem (active)          AWS Cloud (active)
┌────────────┐            ┌──────────────────────┐
│ App Server │◄──R53──►   │ ELB → EC2 ASG (full) │
│ Primary DB │──repl──►   │ RDS Secondary         │
└────────────┘            └──────────────────────┘
   Route 53 active-active (or Aurora Global)

All AWS Multi Region = same as Multi Site but both sides are AWS:

Comparison Table:

StrategyRTORPOCostWhat’s Running in AWS
Backup & RestoreHoursHigh💰Nothing (just backups in S3)
Pilot Light10s of minMedium💰💰DB only (EC2 stopped)
Warm StandbyMinutesLow💰💰💰Everything at minimum size
Multi Site / Hot SiteSecondsVery low💰💰💰💰Everything at full production

⚠️ Exam trap: Pilot Light vs Warm Standby — both have DB replicating. The difference: Pilot Light has EC2 stopped (need to start), Warm Standby has EC2 running at minimum (need to scale up).

⚠️ Exam trap: “Cheapest DR” → Backup & Restore. “Lowest RTO/RPO” → Multi Site. “Balance cost and recovery” → Warm Standby.

⚠️ Exam trap: “Critical infrastructure up and running” = Pilot Light (only critical = DB). “Everything running at minimum” = Warm Standby. “Nothing running” = Backup & Restore. Key word is “critical” → Pilot Light.


Disaster Recovery Tips:


DMS – Database Migration Service:

AWS DMS — quickly and securely migrate databases to AWS.

Migration types:

Homogeneous:   Source DB ──► EC2 (DMS) ──► Target DB  (same engine)
Heterogeneous: Source DB ──► SCT (schema) + DMS (data) ──► Target DB  (different engine)

Continuous Replication (CDC):

Corporate DC                        AWS Cloud (VPC)
┌──────────────┐                    ┌─────────────────────────────┐
│ Oracle DB    │── data migration ─►│ DMS Replication Instance    │
│ (source)     │                    │ (Full load + CDC)           │
│              │                    │    Public Subnet            │
│ Server with  │                    │         │                   │
│ AWS SCT      │── schema convert ─►│         ▼                   │
│              │                    │ RDS MySQL (target)          │
└──────────────┘                    │    Private Subnet           │
                                    └─────────────────────────────┘

DMS Sources: On-prem DBs (Oracle, SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, SAP, DB2), Azure SQL, RDS (all incl. Aurora), S3, DocumentDB DMS Targets: On-prem DBs, RDS, Redshift, DynamoDB, S3, OpenSearch, Kinesis Data Streams, Apache Kafka, DocumentDB, Neptune, Redis, Babelfish

AWS SCT (Schema Conversion Tool):

⚠️ Exam trap: “Migrate database with minimal downtime, source stays available” → DMS. “Different DB engines” → DMS + SCT. “Same engine, different platform” (e.g., on-prem PostgreSQL → RDS PostgreSQL) → DMS only, no SCT needed.

⚠️ Exam trap: SCT converts schema, DMS migrates data — never reversed. Heterogeneous migration order: SCT first (convert schema) → DMS second (move data into converted schema).


RDS & Aurora Migrations:

RDS MySQL → Aurora MySQL:

External MySQL → Aurora MySQL:

⚠️ Exam trap: RDS → Aurora = snapshot (native, cheapest, simplest). S3 dump path is for external/on-prem MySQL → Aurora. If both source and target are inside AWS, snapshot is always the best and most cost-effective option.

RDS PostgreSQL → Aurora PostgreSQL:

External PostgreSQL → Aurora PostgreSQL:

Both databases running? → Use DMS for continuous replication


The 7 R’s of Cloud Migration:

StrategyDescriptionAWS ServiceExample
RetireTurn off what you don’t needKill legacy apps (save up to 20%)
RetainKeep on-prem for nowCompliance, unresolved dependencies
RelocateMove to cloud version as-isVMware Cloud on AWSVMware SDDC → VMware Cloud on AWS
Rehosting (Lift & Shift)Move as-is to AWS, no optimizationsMGNVM → EC2 (save ~30%)
Replatforming (Lift & Reshape)Minor cloud optimizations, no core changesDMS, BeanstalkMySQL → RDS MySQL
Repurchasing (Drop & Shop)Switch to SaaS productCRM → Salesforce, HR → Workday
Refactoring (Re-architect)Rebuild cloud-nativeLambda, DynamoDBMonolith → microservices

⚠️ Exam trap: “Lift-and-shift” = Rehosting (MGN). “Move to RDS without code changes” = Replatforming. “Rewrite as serverless” = Refactoring. “Move VMware SDDC to VMware Cloud on AWS” = Relocate. Don’t confuse Rehosting with Replatforming — rehosting changes nothing, replatforming makes small optimizations.

⚠️ Exam trap: The course says 7 R’s (includes Relocate). Some sources say 6 R’s (no Relocate). Know both — exam may reference either count.


On-Premises Strategy with AWS:


AWS Application Migration Service (MGN):

AWS MGN — the “AWS evolution” of CloudEndure Migration, replacing AWS Server Migration Service (SMS).

Corporate DC / Any Cloud                    AWS Cloud
┌──────────────────────┐                    ┌───────────────────────────────┐
│ OS    ┐              │                    │  Staging          Production │
│ Apps  ├─► Replication │── continuous ──►   │  Low-cost EC2  → Target EC2  │
│ DB    │    Agent      │   replication      │  & EBS volumes   & EBS vols  │
│ Disks ┘              │                    │         (cutover) ──►        │
└──────────────────────┘                    └───────────────────────────────┘

⚠️ Exam trap: “Lift-and-shift to AWS, minimal downtime” → AWS MGN (Application Migration Service). NOT DMS (that’s for databases only). NOT SMS (deprecated, replaced by MGN).


VMware Cloud on AWS:

VMware Cloud on AWS — extend VMware-based on-prem data centers to AWS while keeping VMware Cloud software.


Transferring Large Data into AWS:

Example: 200 TB, 100 Mbps internet connection:

MethodSetup TimeTransfer TimeNotes
Internet / VPNImmediate~185 days200TB × 8 / 100 Mbps
Direct Connect 1 Gbps>1 month~18.5 daysFaster but long setup
Snowball~1 week~1 weekEnd-to-end, can combine with DMS

For ongoing replication: Site-to-Site VPN or DX with DMS or DataSync

⚠️ Exam trap: “Transfer 200 TB quickly” → Snowball (~1 week). NOT internet (185 days). NOT DX (setup alone >1 month). For ongoing sync after initial transfer → DMS or DataSync.



🎯 MASTER SUMMARY: Disaster Recovery & Migration Exam Guide

Part 1: Core Principles (Understand WHY → Derive WHAT)

Principle 1: RPO and RTO Are Cost Tradeoffs

Lower RPO/RTO = more money. RPO = how much data you can lose (backward). RTO = how much downtime you can accept (forward). Every DR strategy is a position on the cost ↔ speed spectrum. If the question says “regardless of cost” → Multi-Site. If “cheapest” → Backup & Restore.

Principle 2: DR Strategies Are a Spectrum of “What’s Running”

The 4 strategies differ by how much infrastructure is pre-provisioned in the DR region:

Key insight: You don’t need to memorize RTO numbers. Just ask: “How much needs to start/scale on failover?” More startup = more time = higher RTO.

Principle 3: Migration Tool = What You’re Moving

Each tool moves a specific thing. The exam tests whether you can pick the right one:

Principle 4: SCT + DMS Have Distinct, Non-Overlapping Roles

SCT converts schema (structure). DMS migrates data (content). They’re never reversed. If engines are the same → no SCT needed. If engines differ → SCT first, DMS second.

Derivation trick: Schema = blueprint of the house. Data = furniture. You must build the house (SCT) before moving furniture in (DMS).

Principle 5: “Same Engine” = Simpler Migration Path

Same engine (homogeneous) eliminates complexity everywhere:

Different engine (heterogeneous) always adds an extra step (SCT, conversion).

Principle 6: RDS → Aurora Is a Special Case (Stays Inside AWS)

When both source and target are AWS services, use native AWS operations (snapshot, read replica promotion). S3 dump/import path is for external databases entering AWS. The exam tests this: “most cost-effective RDS → Aurora” = snapshot, NOT S3.

Principle 7: DMS Runs on EC2 (You Manage the Instance)

DMS is not serverless — it runs on a replication instance (EC2). You choose instance type. Multi-AZ deployment gives HA for the replication instance itself. The source DB stays available during migration (non-disruptive).

Principle 8: AWS Backup ≠ DataSync ≠ S3 Lifecycle

Three different things that sound similar:

Principle 9: MGN Replaced Both CloudEndure AND SMS

Historical evolution: CloudEndure Migration + AWS SMS → AWS MGN. If the exam mentions CloudEndure or SMS, the modern answer is MGN. Similarly, CloudEndure Disaster Recovery → AWS DRS.

Principle 10: Snowball Beats Internet for Large One-Time Transfers

Physics: shipping a device is faster than transferring hundreds of TB over a wire. Rule of thumb: if transfer calculation shows weeks/months → Snowball wins. Direct Connect needs >1 month setup, so for urgent large transfers it’s too slow.


Part 2: Decision Trees (Follow Keywords → Find Answer)

DR Strategy Decision Tree

What does the question ask for?
│
├─ "Cheapest" / "lowest cost" / "budget"
│  └─► Backup & Restore
│
├─ "Critical infrastructure running" / "core running"
│  └─► Pilot Light
│
├─ "Everything running at minimum" / "scaled down"
│  └─► Warm Standby
│
├─ "Lowest RTO" / "fastest recovery" / "regardless of cost" / "active-active"
│  └─► Multi-Site
│
└─ "Balance cost and recovery"
   └─► Warm Standby

Migration Tool Decision Tree

What are you migrating?
│
├─ DATABASE
│  ├─ Same engine? → DMS only
│  ├─ Different engine? → SCT + DMS
│  ├─ RDS → Aurora (same family)? → Snapshot restore
│  └─ External DB → Aurora? → S3 import (Percona/mysqldump) or DMS
│
├─ SERVERS / VMs / APPLICATIONS
│  ├─ Migration (one-time move)? → MGN
│  └─ DR (ongoing failover)? → DRS
│
├─ FILES / DATA
│  ├─ On-prem ↔ AWS sync? → DataSync
│  └─ Bulk physical transfer? → Snowball
│
└─ BACKUP MANAGEMENT
   └─ Centralized backup across services? → AWS Backup

The CANNOT List

You CANNOT…Why
Use SCT for data migrationSCT only converts schema
Use DMS for schema conversionDMS only moves data
Skip SCT for heterogeneous migrationDifferent engines need schema conversion
Use S3 dump for RDS → Aurora (cost-effectively)Snapshot is native and free
Use MGN for database migrationMGN migrates servers, not databases
Delete Backup Vault Lock backups (even root)WORM protection
Use DataSync without an agent (on-prem)Agent required for on-prem source
Set up Direct Connect in < 1 monthPhysical provisioning required

Part 3: Scenario Pattern Recognition

Pattern: “Migrate on-premises Oracle to Aurora PostgreSQL”

Keywords: different engines, Oracle, Aurora Answer: SCT (convert schema) + DMS (migrate data) Why: Heterogeneous migration — Oracle ≠ PostgreSQL, so schema conversion required first.


Pattern: “Migrate RDS MySQL to Aurora MySQL, most cost-effective”

Keywords: RDS → Aurora, same engine family, cost-effective Answer: Create snapshot from RDS → restore as Aurora Why: Native AWS operation, no intermediate storage cost. S3 path is for external databases.


Pattern: “Lift-and-shift on-premises servers to AWS with minimal downtime”

Keywords: lift-and-shift, servers, minimal downtime Answer: AWS MGN (Application Migration Service) Why: MGN does continuous replication of servers → cutover with minimal downtime. NOT DMS (databases only).


Pattern: “DR with critical infrastructure always running”

Keywords: critical, running, DR Answer: Pilot Light Why: Only critical components (DB) always on. EC2 stopped until disaster. “Critical” is the keyword.


Pattern: “Fastest possible disaster recovery”

Keywords: lowest RTO, fastest, regardless of cost Answer: Multi-Site / Hot Site Why: Full production on both sides, active-active routing = seconds RTO.


Pattern: “Transfer 200 TB of data to AWS quickly”

Keywords: large data, TB, quickly Answer: AWS Snowball Why: Internet = months, DX = weeks + setup time. Snowball = ~1 week end-to-end.


Pattern: “Centrally manage backups across RDS, DynamoDB, EFS, EBS”

Keywords: centrally manage, backups, multiple services Answer: AWS Backup Why: Only service that orchestrates backups across all these services. S3 Lifecycle only manages S3 objects.


Pattern: “Prevent backup deletion even by root user”

Keywords: prevent deletion, root, immutable, compliance Answer: AWS Backup Vault Lock (WORM) Why: WORM = Write Once Read Many. Even root can’t delete. Similar to S3 Object Lock.


Pattern: “Migrate on-prem PostgreSQL to RDS PostgreSQL”

Keywords: same engine, different platform Answer: DMS only (no SCT needed) Why: Same engine = homogeneous migration. SCT only needed when engines differ.


Pattern: “Ongoing replication after initial database migration”

Keywords: continuous, ongoing, replication, CDC Answer: DMS with CDC (Change Data Capture) Why: DMS supports continuous replication, not just one-time migration.


Pattern: “Gather information about on-prem servers before migration”

Keywords: discovery, planning, inventory, on-premises Answer: AWS Application Discovery Service → Migration Hub Why: Agentless (VM inventory) or Agent-based (processes, network). Results viewed in Migration Hub.


Pattern: “DR for servers with continuous block-level replication”

Keywords: DR, servers, continuous replication, failover/failback Answer: AWS DRS (Elastic Disaster Recovery) Why: DRS = ongoing DR with failover. MGN = one-time migration. Both use continuous replication but different purpose.


Pattern: “Move large files from on-prem NFS to S3/EFS”

Keywords: files, on-prem, NFS, SMB, sync Answer: AWS DataSync Why: Agent-based, preserves file permissions, incremental sync. Not DMS (databases) or Snowball (physical).


Pattern: “Extend VMware environment to AWS”

Keywords: VMware, vSphere, hybrid, extend Answer: VMware Cloud on AWS Why: Runs vSphere/vSAN/NSX on dedicated AWS hardware. Keep VMware tools, access AWS services.


Pattern: “Test application resilience by injecting faults”

Keywords: chaos, fault injection, resilience, stress test Answer: AWS FIS (Fault Injection Simulator) Why: Managed chaos engineering — CPU stress, stop instances, API errors. Pre-built templates.


Pattern: “Build a business case for migration to AWS”

Keywords: business case, cost analysis, baseline, current state Answer: AWS Migration Evaluator Why: Agentless Collector discovers on-prem footprint → analyzes → builds data-driven migration plan. NOT Application Discovery Service (that discovers servers, not costs).


Pattern: “Track migration progress across multiple services”

Keywords: track, central dashboard, migration status, MGN + DMS Answer: AWS Migration Hub (+ Orchestrator for enterprise app templates) Why: Central location aggregating status from MGN and DMS. Orchestrator has pre-built templates for SAP, SQL Server.


Migration Services Comparison

ServiceMigratesDirectionKey Feature
DMSDatabasesAny directionCDC, source stays available
SCTDB SchemaN/A (conversion)Heterogeneous engine conversion
MGNServers/VMs/AppsTo AWSLift-and-shift, replaces SMS
DRSServers (DR)To AWSFailover/failback, replaces CloudEndure DR
DataSyncFiles/DataOn-prem ↔ AWS, AWS ↔ AWSAgent-based, incremental
SnowballBulk dataPhysical shippingLarge one-time transfers
AWS BackupBackupsWithin AWSCentralized backup management
Migration EvaluatorBusiness caseAssessmentData-driven cost analysis
Migration HubTrackingCentral dashboardTracks MGN + DMS progress
App DiscoveryServer inventoryOn-prem → AWSAgentless or agent-based

DR Strategy Quick Compare

Backup & RestorePilot LightWarm StandbyMulti-Site
RTOHours10s of minMinutesSeconds
Cost💰💰💰💰💰💰💰💰💰💰
DBSnapshots onlyRunningRunningRunning
App serversNothingStoppedMin capacityFull prod
Route 53Manual updateFailoverFailoverActive-active
On failoverRestore everythingStart EC2, scaleScale upAlready active

RDS/Aurora Migration Paths

FromToBest Method
RDS MySQLAurora MySQLSnapshot restore
RDS PostgreSQLAurora PostgreSQLSnapshot restore
External MySQLAurora MySQLPercona XtraBackup → S3
External PostgreSQLAurora PostgreSQLBackup → S3 → aws_s3 extension
Any DB (ongoing)Any targetDMS with CDC
Different engineDifferent engineSCT + DMS

Legacy → Modern Service Mapping

LegacyModern Replacement
CloudEndure MigrationAWS MGN
AWS SMS (Server Migration)AWS MGN
CloudEndure Disaster RecoveryAWS DRS

Part 5: Ultimate Instant-Answer Table

Question Contains→ Instant Answer
“Lift-and-shift” / “rehost”MGN
“Database migration”DMS
“Different DB engines” / “heterogeneous”SCT + DMS
“Same engine, different platform”DMS only (no SCT)
“Schema conversion”SCT
“RDS → Aurora, cost-effective”Snapshot restore
“External MySQL → Aurora”S3 (Percona XtraBackup)
“Continuous DB replication” / “CDC”DMS
“DR, continuous block replication”DRS
“Cheapest DR”Backup & Restore
“Critical infrastructure running”Pilot Light
“Everything running at minimum”Warm Standby
“Lowest RTO, regardless of cost”Multi-Site
“Active-active DR”Multi-Site
“Centralized backup automation”AWS Backup
“Prevent backup deletion by root”Backup Vault Lock (WORM)
“Move files on-prem ↔ AWS”DataSync
“Transfer 200 TB quickly”Snowball
“Discover on-prem servers for migration”Application Discovery Service
“Build business case for migration”Migration Evaluator
“Migration planning and tracking”Migration Hub
“Pre-built migration templates (SAP, SQL Server)”Migration Hub Orchestrator
“Chaos engineering” / “fault injection”FIS
“Extend VMware to AWS” / “Relocate VMware”VMware Cloud on AWS
“Source DB stays available during migration”DMS
“Migrate VMs to EC2”VM Import/Export or MGN
“Replace CloudEndure Migration”MGN
“Replace SMS”MGN
“Replace CloudEndure DR”DRS
“Backup across RDS, DynamoDB, EFS, EBS”AWS Backup
“PITR (Point-in-time Recovery)”AWS Backup
“Minimize data loss”Optimize RPO
“Minimize downtime”Optimize RTO
“Ongoing sync after initial transfer”DataSync or DMS
“Replatform” / “minor optimizations”DMS (e.g., MySQL → RDS MySQL)
“Refactor” / “re-architect”Serverless / cloud-native rebuild

Part 6: Elimination Checklist

Choosing a DR Strategy

□ Is cost the primary concern?
  → Yes = Backup & Restore
  → No = continue
□ Does it mention "critical" infrastructure running?
  → Yes = Pilot Light
  → No = continue
□ Does it say "everything running" at minimum/scaled down?
  → Yes = Warm Standby
  → No = continue
□ Does it say "fastest" / "lowest RTO" / "active-active"?
  → Yes = Multi-Site

Choosing a Migration Tool

□ Are you migrating a DATABASE?
  → Yes: Same engine? → DMS only
  → Yes: Different engine? → SCT + DMS
  → Yes: RDS → Aurora (same family)? → Snapshot
  → No = continue
□ Are you migrating SERVERS / VMs / APPS?
  → For migration (one-time)? → MGN
  → For DR (ongoing failover)? → DRS
□ Are you moving FILES / DATA?
  → On-prem ↔ AWS sync? → DataSync
  → Bulk physical? → Snowball
□ Are you managing BACKUPS?
  → Across AWS services? → AWS Backup

Is SCT Needed?

□ Are source and target DB engines DIFFERENT?
  → Yes = SCT + DMS
  → No (same engine) = DMS only, NO SCT

🏆 The Golden Rules

  1. RPO = data loss, RTO = downtime (backward vs forward from disaster)
  2. More money = faster recovery (the entire DR spectrum is a cost tradeoff)
  3. “Critical running” = Pilot Light (not Warm Standby, not Backup & Restore)
  4. SCT = schema, DMS = data (never reversed, SCT always first)
  5. Same engine = no SCT (homogeneous migration skips schema conversion)
  6. RDS → Aurora = snapshot (native, cheapest — S3 path is for external DBs)
  7. Servers → MGN, Databases → DMS (never confuse what each tool migrates)
  8. MGN replaced SMS AND CloudEndure Migration (always pick MGN for lift-and-shift)
  9. DRS replaced CloudEndure DR (always pick DRS for disaster recovery of servers)
  10. Backup Vault Lock = even root can’t delete (WORM, like S3 Object Lock)
  11. DataSync = files, AWS Backup = snapshots (different mechanisms, different purpose)
  12. Snowball wins for large one-time transfers (faster than internet or DX when > 100 TB)
  13. DMS source stays available (non-disruptive migration — key selling point)
  14. Application Discovery → Migration Evaluator → Migration Hub (discover → build business case → track)
  15. 7 R’s: Rehost (MGN) ≠ Replatform (DMS) ≠ Relocate (VMware) ≠ Refactor (rebuild) (know which R matches which tool)


🎯 CROSS-TOPIC DECISION TREES

These cut across multiple MASTER SUMMARY sections — use when the question doesn’t clearly fit one topic.

Decision Tree 1: “Data Needs to Move”

What kind of data is moving?
│
├─► DATABASE
│   ├─ Same engine? → DMS only
│   ├─ Different engine? → SCT + DMS
│   ├─ RDS → Aurora (same family)? → Snapshot restore
│   └─ External MySQL → Aurora? → Percona XtraBackup → S3
│
├─► FILES / OBJECTS
│   ├─ Network OK (< 1 week)?
│   │   ├─ One-time / scheduled sync → DataSync
│   │   ├─ Ongoing hybrid access → Storage Gateway
│   │   └─ FTP/SFTP for external users → Transfer Family
│   └─ Network bad (> 1 week)?
│       ├─ < 14 TB → Snowcone
│       └─ > 14 TB → Snowball Edge
│
├─► SERVERS / VMs
│   ├─ Migrate to AWS (one-time) → MGN (lift-and-shift)
│   └─ DR failover/failback → DRS
│
└─► CROSS-REGION / CROSS-ACCOUNT within AWS
    ├─ S3 → S3 → S3 Replication (CRR/SRR)
    ├─ S3 → EFS/FSx → DataSync (no agent needed)
    ├─ RDS/Aurora → Read Replica → promote
    ├─ DynamoDB → Global Tables
    └─ EBS → Snapshots → copy to target region

Decision Tree 2: “Real-Time Processing”

What needs to happen in real-time?
│
├─► STREAMING DATA (continuous, ordered)
│   ├─ Need ordering + replay? → Kinesis Data Streams
│   ├─ Need delivery to S3/Redshift/OpenSearch? → Kinesis Firehose
│   ├─ Need SQL on streams? → Kinesis Data Analytics
│   └─ Need Apache Kafka compatible? → Amazon MSK
│
├─► EVENT-DRIVEN (discrete events, react)
│   ├─ AWS service state change? → EventBridge
│   ├─ Metric threshold crossed? → CloudWatch Alarm
│   ├─ Message queue (decouple)? → SQS
│   ├─ Fan-out to many? → SNS (or SNS + SQS)
│   └─ Orchestrate steps? → Step Functions
│
├─► LOG PROCESSING
│   ├─ Real-time → CloudWatch Subscription Filters
│   ├─ Near real-time to S3 → Firehose
│   └─ Batch/archive → S3 Export (up to 12h delay)
│
└─► API / REQUEST PROCESSING
    ├─ Sync (immediate response) → Lambda + API Gateway
    ├─ Async (fire-and-forget) → Lambda + SQS/SNS
    └─ Long-running → Step Functions / ECS tasks

Decision Tree 3: “Search / Query Data”

What kind of search/query?
│
├─► FULL-TEXT SEARCH (partial match, any field)
│   └─► OpenSearch
│       Pattern: DynamoDB (storage) + OpenSearch (search)
│
├─► STRUCTURED QUERIES (SQL)
│   ├─ On data in S3? → Athena (serverless, pay-per-query)
│   ├─ On data warehouse? → Redshift
│   ├─ On CloudTrail logs in S3? → Athena
│   └─ On relational data? → RDS / Aurora
│
├─► KEY-VALUE LOOKUP (by primary key)
│   └─► DynamoDB (single-digit ms)
│
├─► LOG SEARCH
│   ├─ CloudWatch Logs → Logs Insights
│   ├─ Custom logs at scale → OpenSearch
│   └─ VPC traffic → VPC Flow Logs + Athena
│
└─► WHO DID WHAT (audit)
    └─► CloudTrail → S3 → Athena

Decision Tree 4: “Speed Up / Reduce Latency”

What needs to be faster?
│
├─► CONTENT DELIVERY (static/dynamic to users)
│   ├─ Global users, cacheable → CloudFront
│   ├─ Global users, TCP/UDP (gaming, IoT) → Global Accelerator
│   └─ Specific geo + legal needs → CloudFront + Geo Restriction
│
├─► DATABASE READS
│   ├─ Same queries repeated → ElastiCache (Redis/Memcached)
│   ├─ Read-heavy RDS → Read Replicas (up to 15 for Aurora)
│   ├─ DynamoDB reads → DAX (microsecond cache)
│   └─ Global reads → DynamoDB Global Tables / Aurora Global DB
│
├─► API RESPONSES
│   ├─ API Gateway → enable caching
│   ├─ Lambda cold starts → Provisioned Concurrency
│   └─ Lambda + RDS → RDS Proxy (connection pooling)
│
├─► EC2 LAUNCH / BOOT TIME
│   ├─ Static components → Golden AMI (pre-baked)
│   ├─ Dynamic config → User Data scripts
│   ├─ Both → Hybrid (Golden AMI + User Data)
│   └─ EBS volumes → enable EBS Fast Snapshot Restore
│
└─► NETWORK / DATA TRANSFER
    ├─ On-prem ↔ AWS → Direct Connect (dedicated)
    ├─ Backup DX path → Site-to-Site VPN
    ├─ EC2 ↔ EC2 same AZ → Placement Group (cluster)
    └─ HPC storage → FSx for Lustre

Decision Tree 5: “Secure This”

What needs securing?
│
├─► DATA AT REST
│   ├─ S3 → SSE-S3, SSE-KMS, or SSE-C
│   ├─ EBS → KMS encryption
│   ├─ RDS/Aurora → KMS (enable at creation)
│   ├─ DynamoDB → KMS (AWS owned or customer managed)
│   └─ Secrets → Secrets Manager (rotation) or SSM Parameter Store
│
├─► DATA IN TRANSIT
│   ├─ HTTPS everywhere → ACM certificates
│   ├─ S3 → bucket policy with aws:SecureTransport
│   └─ VPN / DX → encrypted by default
│
├─► NETWORK
│   ├─ Instance level → Security Groups (stateful)
│   ├─ Subnet level → NACLs (stateless)
│   ├─ VPC level → Network Firewall (L3-L7)
│   ├─ Web apps → WAF (L7, CloudFront/ALB/API GW)
│   └─ DDoS → Shield (Standard free, Advanced paid)
│
├─► ACCESS CONTROL
│   ├─ "Who can access AWS resources" → IAM Policies
│   ├─ "Org-wide guardrails" → SCPs
│   ├─ "Cross-account" → Resource Policy or IAM Role
│   ├─ "Temporary credentials" → STS AssumeRole
│   └─ "External identity" → Cognito / SSO (IAM Identity Center)
│
└─► AUDIT / COMPLIANCE
    ├─ "Who did what" → CloudTrail
    ├─ "Is it compliant" → Config
    ├─ "Automated compliance audit" → Audit Manager
    └─ "Security findings dashboard" → Security Hub

Cross-Topic Instant-Answer Table

Scenario Keywords→ AnswerTopic Area
“Reduce boot time” + “static + dynamic”Golden AMI + User DataEC2/Deployment
“Search any field” / “partial text”OpenSearch (not DynamoDB)Database
“Query S3 data with SQL”AthenaDatabase/Analytics
“React to S3 upload”S3 Event → Lambda or EventBridgeServerless
“Decouple microservices”SQS (or SNS for fan-out)Messaging
“Global low-latency DB”DynamoDB Global TablesDatabase
“Global low-latency SQL”Aurora Global DatabaseDatabase
“Cache DB queries” (relational)ElastiCacheDatabase
“Cache DB queries” (DynamoDB)DAXDatabase
“Cache API responses”API Gateway CachingServerless
“Migrate DB, no downtime”DMS with CDCDR/Migration
“Move servers to AWS”MGNDR/Migration
“Multi-account security baseline”Control Tower + SCPsSecurity
“Central log analysis”CloudWatch + Subscription FiltersMonitoring
“Cost per project/team”Cost Allocation TagsBilling
“Prevent action org-wide”SCP (not Config — Config only detects)Security
“Auto-fix non-compliant”Config + SSM AutomationMonitoring
“Encrypt at rest, auto-rotate key”KMS with automatic rotationSecurity
“Share resources cross-account”AWS RAMIAM/Networking
“DNS failover”Route 53 Failover routing + Health CheckRoute 53

AWS Cloud Practitioner certificate:

https://www.w3schools.com/aws/aws_quiz.php

https://pages.awscloud.com/NAMER-partner-GC-Partner-Cert-Readiness-Cloud-Practitioner-2024-conf.html

https://www.udemy.com/course/aws-certified-cloud-practitioner-new/

https://media.datacumulus.com/aws-ccp/AWS%20Certified%20Cloud%20Practitioner%20Slides%20v28.pdf

in progress..

https://www.examtopics.com/discussions/amazon/view/68991-exam-aws-certified-solutions-architect-associate-saa-c02/