What is Kubernetes?

Container orchestration system. Automatically manages many containers across many servers.

Problems it solves

  • Decide which server runs which container
  • Restart crashed containers
  • Scale up/down based on load
  • Route traffic to healthy containers
  • Roll out updates without downtime

Key Terms

TermMeaning
ClusterEntire Kubernetes system (Control Plane + all Nodes)
NodeServer (physical/VM) that runs containers
PodSmallest deployable unit. One or more containers sharing storage/network
Control PlaneThe “brain” that makes decisions (scheduling, monitoring, scaling)
┌─────────────────────────────────────────────────────────────────┐
│                     Kubernetes Cluster                          │
│                                                                 │
│  You tell Kubernetes: "I want 3 copies of my web app running"  │
│                              │                                  │
│                              ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Control Plane (the brain)                   │   │
│  │  - Receives your request                                 │   │
│  │  - Decides which servers have capacity                   │   │
│  │  - Schedules containers onto servers                     │   │
│  │  - Monitors health, restarts failed containers           │   │
│  └─────────────────────────────────────────────────────────┘   │
│                              │                                  │
│              ┌───────────────┼───────────────┐                  │
│              ▼               ▼               ▼                  │
│  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐   │
│  │   Node 1        │ │   Node 2        │ │   Node 3        │   │
│  │  ┌───────────┐  │ │  ┌───────────┐  │ │  ┌───────────┐  │   │
│  │  │  Pod      │  │ │  │  Pod      │  │ │  │  Pod      │  │   │
│  │  │ (web app) │  │ │  │ (web app) │  │ │  │ (web app) │  │   │
│  │  └───────────┘  │ │  └───────────┘  │ │  └───────────┘  │   │
│  └─────────────────┘ └─────────────────┘ └─────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

What is EKS (Elastic Kubernetes Service)?

AWS-managed Kubernetes Control Plane + your worker nodes

Problem it solves

Running Kubernetes yourself is complex:

  • Managing Control Plane (API Server, etcd, Scheduler, Controller Manager)
  • Handling upgrades
  • Ensuring high availability
  • Patching security vulnerabilities

EKS removes this burden.

┌─────────────────────────────────────────────────────────────┐
│                        EKS Cluster                          │
├─────────────────────────┬───────────────────────────────────┤
│   Control Plane         │         Data Plane                │
│   (AWS manages)         │         (You manage OR AWS)       │
│                         │                                   │
│  ┌─────────────────┐    │    ┌──────────────────────────┐   │
│  │ API Server      │    │    │ Worker Nodes             │   │
│  │ etcd            │◄───┼───►│ (EC2 or Fargate)         │   │
│  │ Scheduler       │    │    │                          │   │
│  │ Controller Mgr  │    │    │  ┌─────┐ ┌─────┐ ┌─────┐ │   │
│  └─────────────────┘    │    │  │Pod A│ │Pod B│ │Pod C│ │   │
│                         │    │  └─────┘ └─────┘ └─────┘ │   │
│  Runs in AWS-managed    │    └──────────────────────────┘   │
│  VPC (hidden from you)  │                                   │
└─────────────────────────┴───────────────────────────────────┘

ECS vs EKS Terminology Mapping

ECSEKS (Kubernetes)
Task DefinitionPod spec (in Deployment YAML)
TaskPod
ServiceDeployment + Service
ClusterCluster
Container Instance (EC2)Node
FargateFargate (same in both)

Task (ECS) ≈ Pod (EKS)

Both are the smallest deployable unit.

ECS TaskEKS Pod
What it isOne or more containers running togetherOne or more containers running together
Share network?YesYes
Share storage?YesYes
Defined byTask Definition (JSON)Pod spec (YAML)

Key difference: Pod is a Kubernetes concept (industry standard). Task is AWS-specific (ECS only).

Service (ECS) ≠ Node (EKS)

Completely different concepts.

ECS ServiceEKS Node
What it isKeeps N copies of a Task runningServer that runs Pods
Purpose“I want 3 web servers always running”Physical/virtual machine providing compute
  • ECS equivalent of Node = EC2 instance (or Fargate)
  • EKS equivalent of ECS Service = Deployment + Service

ECS Service vs Kubernetes Deployment + Service

ECS: One concept does two things

┌─────────────────────────────────────────────────────────┐
│                    ECS Service                          │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │  "Keep 3 Tasks running"                          │   │
│  │  "Register them with ALB target group"           │   │
│  │  "Replace unhealthy Tasks"                       │   │
│  │  "Rolling update when Task Definition changes"   │   │
│  └─────────────────────────────────────────────────┘   │
│                          │                              │
│           ┌──────────────┼──────────────┐              │
│           ▼              ▼              ▼              │
│       ┌──────┐       ┌──────┐       ┌──────┐          │
│       │Task 1│       │Task 2│       │Task 3│          │
│       └──────┘       └──────┘       └──────┘          │
└─────────────────────────────────────────────────────────┘

Kubernetes: Two separate concepts

Deployment = Pod management

  • “Keep 3 Pods running”
  • “Rolling update when Pod spec changes”
  • “Replace crashed Pods”

Service = Network management

  • “Give these Pods a stable IP/DNS name”
  • “Load balance traffic across Pods”
  • “Track which Pods are healthy”
┌─────────────────────────────────────────────────────────────────┐
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Deployment                            │   │
│  │  "Keep 3 Pods running with this container image"         │   │
│  │                                                          │   │
│  │      ┌──────┐       ┌──────┐       ┌──────┐             │   │
│  │      │Pod 1 │       │Pod 2 │       │Pod 3 │             │   │
│  │      └──┬───┘       └──┬───┘       └──┬───┘             │   │
│  └─────────┼──────────────┼──────────────┼──────────────────┘   │
│            └──────────────┼──────────────┘                      │
│                           │                                     │
│  ┌────────────────────────┼────────────────────────────────┐   │
│  │                    Service                               │   │
│  │  "Expose these Pods on cluster IP 10.0.0.50:80"         │   │
│  │  "DNS name: my-app.default.svc.cluster.local"           │   │
│  │  "Load balance incoming requests across all Pods"        │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Why does Kubernetes split them?

Flexibility. You can mix and match:

ScenarioWhat you create
Internal microserviceDeployment + Service (ClusterIP)
Public web appDeployment + Service (LoadBalancer)
Background worker (no network needed)Deployment only (no Service)
Expose existing external DBService only (no Deployment)
Canary deployment2 Deployments + 1 Service

Kubernetes Service Types

TypeAccessible fromUse case
ClusterIPInside cluster onlyPod-to-Pod communication
NodePortEach Node’s IP at static portRarely used
LoadBalancerInternetExternal users accessing your app

Multiple Pods/Tasks per Node/EC2

Normal case. One Node/EC2 instance runs multiple Pods/Tasks.

┌─────────────────────────────────────────────────────────────┐
│                   EC2 Instance (Node)                        │
│                   (e.g., m5.large: 2 vCPU, 8GB RAM)         │
│                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
│  │   Pod A     │  │   Pod B     │  │   Pod C     │          │
│  │  (web app)  │  │  (api)      │  │  (worker)   │          │
│  │  256MB RAM  │  │  512MB RAM  │  │  1GB RAM    │          │
│  └─────────────┘  └─────────────┘  └─────────────┘          │
│                                                              │
│  Total used: ~1.8GB of 8GB available                        │
│  Room for more Pods until resources exhausted               │
└─────────────────────────────────────────────────────────────┘

How scheduler decides placement

  1. Available resources - Does the Node have enough CPU/memory?
  2. Constraints - Does the Pod require specific Node type (GPU, etc.)?
  3. Spreading - Avoid putting all copies on same Node (fault tolerance)

How ClusterIP Service Works

Frontend Pod calling API Pod:

┌─────────────────────────────────────────────────────────────────┐
│                        EKS Cluster                               │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Deployment: frontend (3 Pods)                            │    │
│  │  calls: http://api-service:8080/users                    │    │
│  └───────────────────────────┬──────────────────────────────┘    │
│                              │                                   │
│                              ▼                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Service: api-service (ClusterIP)                         │    │
│  │ IP: 10.100.50.25 (internal only)                         │    │
│  │ DNS: api-service.default.svc.cluster.local               │    │
│  └───────────────────────────┬──────────────────────────────┘    │
│                              │ load balances                     │
│                              ▼                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Deployment: api (5 Pods)                                 │    │
│  │  ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐     │    │
│  │  │Pod    │ │Pod    │ │Pod    │ │Pod    │ │Pod    │     │    │
│  │  │:8080  │ │:8080  │ │:8080  │ │:8080  │ │:8080  │     │    │
│  │  └───────┘ └───────┘ └───────┘ └───────┘ └───────┘     │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

What is ClusterIP?

Virtual IP. No actual network interface has this IP. It only exists in iptables rules.

Packet flow

1. Frontend Pod sends HTTP request to api-service:8080

2. DNS resolves: api-service → 10.100.50.25 (ClusterIP)

3. Packet sent: dst: 10.100.50.25:8080

4. Node's iptables intercepts packet

5. iptables rewrites destination:
   10.100.50.25:8080 → 10.0.2.47:8080 (actual Pod IP)

6. Packet routed to API Pod

kube-proxy

Component running on every Node that manages iptables rules.

┌─────────────────────────────────────────────────────────────┐
│                    Control Plane                             │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ API Server                                            │   │
│  │ Knows:                                                │   │
│  │ - Service "api-service" has ClusterIP 10.100.50.25   │   │
│  │ - Pods behind it: 10.0.2.47, 10.0.2.48, 10.0.3.22    │   │
│  └───────────────────────┬──────────────────────────────┘   │
└──────────────────────────┼──────────────────────────────────┘
                           │ watches for changes
                           ▼
┌──────────────────────────────────────────────────────────────┐
│  Node 1                  │  Node 2                           │
│  ┌─────────────────┐     │  ┌─────────────────┐              │
│  │ kube-proxy      │     │  │ kube-proxy      │              │
│  │ Updates local   │     │  │ Updates local   │              │
│  │ iptables rules  │     │  │ iptables rules  │              │
│  └─────────────────┘     │  └─────────────────┘              │
└──────────────────────────┴───────────────────────────────────┘

When Pods change

API Pod 10.0.2.48 crashes
         │
         ▼
Control Plane detects Pod gone
         │
         ▼
API Server updates Endpoints:
  api-service → [10.0.2.47, 10.0.3.22]  (removed .48)
         │
         ▼
kube-proxy on all Nodes sees change
         │
         ▼
iptables rules updated:
  10.100.50.25 → [10.0.2.47, 10.0.3.22]

New requests never go to dead Pod.

Why Use Containers?

Problems without containers

Problem 1: “Works on my machine”

Developer's laptop:        Production server:
- Python 3.11              - Python 3.8
- Library v2.1             - Library v1.9

App works locally → Crashes in production

Problem 2: Dependency conflicts

One EC2 running two apps:

App A needs: Python 3.8, OpenSSL 1.1
App B needs: Python 3.11, OpenSSL 3.0

They conflict. Need separate servers = more cost.

Problem 3: Slow deployment

Traditional deployment:
1. SSH into server
2. Stop old app
3. Pull new code
4. Install dependencies (5-10 minutes)
5. Start new app
6. Hope it works

If it fails → rollback is painful

Problem 4: Slow scaling

Traffic spike → need more servers

1. Launch new EC2 (2-3 minutes)
2. Install OS packages
3. Install app dependencies
4. Configure app
5. Start app
6. Register with load balancer

Total: 5-10 minutes. Users already left.

Solutions with containers

Solution 1: Package everything together

Container image includes:
- Your app code
- Exact Python version
- Exact library versions
- Exact OS libraries

Same image runs identically everywhere:
laptop = staging = production

Solution 2: Isolation

One EC2 running two containers:

┌─────────────────────────────────────────┐
│              EC2 Instance               │
│  ┌─────────────────┐ ┌─────────────────┐│
│  │ Container A     │ │ Container B     ││
│  │ Python 3.8      │ │ Python 3.11     ││
│  │ OpenSSL 1.1     │ │ OpenSSL 3.0     ││
│  │ (isolated)      │ │ (isolated)      ││
│  └─────────────────┘ └─────────────────┘│
└─────────────────────────────────────────┘

No conflicts. Both run on same server.

Solution 3: Fast deployment

Container deployment:
1. Pull new image (already built, seconds)
2. Start new container
3. Health check passes
4. Route traffic to new container
5. Stop old container

Rollback = start old image (seconds)

Solution 4: Fast scaling

Traffic spike → need more containers

1. Container image already exists
2. Start new container (seconds)
3. Health check passes
4. Route traffic

Total: 10-30 seconds

ECS vs EKS: When to Use

ECSEKS
SystemAWS-proprietaryIndustry-standard Kubernetes
ComplexitySimpler, fewer conceptsMore complex, more features
SkillsAWS-onlyPortable (any cloud/on-prem)
AWS integrationTighterLarge ecosystem (Helm, Istio, etc.)
ConfigurationLessMore configuration, more control

Use ECS when:

  • Simpler container workloads
  • Team doesn’t know Kubernetes
  • Want less operational overhead
  • Only using AWS

Use EKS when:

  • Team already knows Kubernetes
  • Need Kubernetes-specific features
  • Want portability across clouds
  • Need ecosystem tools (Helm, service mesh, etc.)

Node Management: ECS vs EKS

ECS Node Management

Two modes:

ModeWho manages EC2?
EC2 Launch TypeYou (via ASG)
FargateAWS (no EC2 to manage)

EC2 Launch Type: You create EC2 instances with ECS Agent pre-installed (ECS-optimized AMI). Agent auto-registers to cluster.

┌─────────────────────────────────────────────────────────────┐
│                    EC2 Instance                              │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ ECS Agent                                            │    │
│  │ - Registers with ECS Control Plane                   │    │
│  │ - Reports available CPU/memory                       │    │
│  │ - Receives "Run this Task" commands                  │    │
│  │ - Starts/stops containers                            │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

EKS Node Management

Three modes:

ModeWho manages EC2?Who joins to cluster?
Self-Managed NodesYou (via ASG)You (bootstrap script)
Managed Node GroupsAWSAWS
FargateAWSN/A

Self-Managed: You create EC2 with EKS-optimized AMI and bootstrap script. kubelet auto-registers to API Server.

┌─────────────────────────────────────────────────────────────┐
│                    EC2 Instance (Node)                       │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ kubelet                                              │    │
│  │ - Registers with Kubernetes API Server               │    │
│  │ - Reports Node status                                │    │
│  │ - Receives "Run this Pod" commands                   │    │
│  │ - Manages Pod lifecycle                              │    │
│  └─────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ kube-proxy                                           │    │
│  │ - Manages iptables rules for Service routing         │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Managed Node Groups: You specify instance type and count. AWS handles provisioning, joining, and rolling updates.


Using ASG with ECS/EKS

ASG manages EC2 instance count. Registration to cluster is automatic via User Data script.

ECS + ASG Setup

1. Create ECS Cluster (name only)

2. Create Launch Template:
   - AMI: ECS-optimized AMI
   - IAM Role: ecsInstanceRole
   - User Data:
     #!/bin/bash
     echo "ECS_CLUSTER=my-cluster" >> /etc/ecs/ecs.config

3. Create ASG with Launch Template
   - No Target Group needed
   - Instances auto-register to ECS cluster

EKS + ASG Setup (Self-Managed)

1. Create EKS Cluster

2. Create IAM Role for Nodes:
   - AmazonEKSWorkerNodePolicy
   - AmazonEC2ContainerRegistryReadOnly
   - AmazonEKS_CNI_Policy

3. Update aws-auth ConfigMap (allow IAM role)

4. Create Launch Template:
   - AMI: EKS-optimized AMI
   - IAM Role: (from step 2)
   - User Data:
     #!/bin/bash
     /etc/eks/bootstrap.sh my-eks-cluster

5. Create ASG with Launch Template
   - No Target Group needed
   - Instances auto-register as Nodes

Key difference: EKS requires aws-auth ConfigMap update for authorization.


Where is the Kubernetes API Server?

In EKS: AWS manages it in their own VPC. You don’t see the EC2 instances running it.

┌─────────────────────────────────────────────────────────────┐
│                    AWS-Managed VPC                           │
│                    (hidden from you)                         │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              EKS Control Plane                       │    │
│  │                                                      │    │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐    │    │
│  │  │ API Server │  │ API Server │  │ API Server │    │    │
│  │  │ (HA)       │  │ (HA)       │  │ (HA)       │    │    │
│  │  └────────────┘  └────────────┘  └────────────┘    │    │
│  │                                                      │    │
│  │  ┌────────────┐                                     │    │
│  │  │ etcd       │  (stores cluster state)             │    │
│  │  └────────────┘                                     │    │
│  │                                                      │    │
│  │  Endpoint: https://XXXXX.eks.amazonaws.com          │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ HTTPS
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Your VPC                                  │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │ Node         │  │ Node         │  │ Node         │       │
│  │ (kubelet     │  │ (kubelet     │  │ (kubelet     │       │
│  │  talks to    │  │  talks to    │  │  talks to    │       │
│  │  API Server) │  │  API Server) │  │  API Server) │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
│                                                              │
│  kubectl commands also go to API Server endpoint            │
└─────────────────────────────────────────────────────────────┘

Endpoint access options:

  • Public endpoint (accessible from internet)
  • Private endpoint (only from within VPC)
  • Both

ASG vs Target Group

These are independent concepts.

ConceptPurposeRequired for cluster?
ASGScale EC2 instance countOptional (can use Managed Node Groups or Fargate)
Target GroupRoute ALB/NLB traffic to targetsOnly if you need load balancer

ASG: Manages Node count

ASG: "Keep 3-10 EC2 instances running"

Nothing to do with traffic routing.

Target Group: Routes traffic to Pods/Tasks

ALB → Target Group → Pod/Task IPs

Who registers Pods to Target Group?
- ECS: ECS Service (automatic)
- EKS: AWS Load Balancer Controller (you install)

In container workloads: Traffic goes to Pods/Tasks, not EC2 instances. So Target Group contains Pod IPs, not Node IPs.


What is ConfigMap?

Kubernetes object storing configuration as key-value pairs. Pods can read this data as environment variables or files.

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  DATABASE_HOST: "db.example.com"
  LOG_LEVEL: "info"

Purpose: Separate configuration from container image. Change config without rebuilding.


What is aws-auth ConfigMap?

Special ConfigMap that maps AWS IAM identities to Kubernetes permissions.

Kubernetes doesn’t understand IAM. aws-auth bridges them.

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::123456789:role/my-node-role
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes

Why Nodes need it

1. New EC2 Node starts, kubelet calls API Server: "Register me"

2. kubelet authenticates using EC2's IAM Role

3. API Server checks aws-auth ConfigMap:
   "Is this IAM Role allowed to be a Node?"

4. If IAM Role is in aws-auth → Node joins cluster
   If not → Rejected

Key groups

GroupPermission
system:nodesAllows Node operations
system:mastersFull admin access

Fargate Pricing vs EC2

Fargate costs more per compute unit, but can be cheaper overall.

When Fargate is cheaper

  • Variable/bursty workloads
  • Low utilization
  • Short-running tasks
  • Don’t want to manage nodes

When EC2 is cheaper

  • Steady 24/7 workloads
  • High utilization (>70%)
  • Can use Reserved Instances or Spot

Example comparison (1 vCPU, 2GB)

FargateEC2 m5.large (2 vCPU, 8GB)
Hourly~$0.05~$0.10
8 hours/day, 30 days$12$72 (24/7) or $24 (8hr/day)

Fargate wins at low utilization. EC2 wins at high utilization.


EKS Autoscaling Components

Overview

ComponentWhat it does
Horizontal Pod Autoscaler (HPA)Adds/removes pod replicas based on CPU, memory, or custom metrics
Vertical Pod Autoscaler (VPA)Adjusts CPU/memory requests for existing pods
Cluster AutoscalerAdds/removes EC2 nodes when pods can’t be scheduled
KarpenterAlternative to Cluster Autoscaler - provisions optimal EC2 directly
AWS Load Balancer ControllerCreates ALB/NLB when you define Ingress or LoadBalancer Service

How They Work Together

Traffic increases
     │
     ▼
┌─────────────────────────────────────────────────────────────────┐
│ AWS Load Balancer Controller                                    │
│   Creates/manages ALB or NLB to route traffic to pods           │
└─────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────────────────┐
│ Horizontal Pod Autoscaler (HPA)                                 │
│   "CPU at 80%? Add more pod replicas"                           │
│   Scales: 3 pods → 10 pods                                      │
└─────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────────────────┐
│ Cluster Autoscaler / Karpenter                                  │
│   "10 pods need scheduling but nodes are full? Add EC2 nodes"   │
│   Scales: 2 nodes → 5 nodes                                     │
└─────────────────────────────────────────────────────────────────┘

Resource Requests and Limits

Every pod specifies how much CPU/memory it needs:

containers:
- name: my-app
  resources:
    requests:          # Guaranteed minimum - scheduler uses this
      cpu: 500m        # 500 millicores = 0.5 CPU
      memory: 256Mi
    limits:            # Maximum allowed - killed if exceeds memory
      cpu: 1000m
      memory: 512Mi
TermWhat it means
Request“I need at least this much” - used for scheduling
Limit“Never give me more than this” - enforced at runtime
Actual usageWhat the container is really using right now

Horizontal Pod Autoscaler (HPA)

Adds/removes pod replicas based on metrics.

How HPA Calculates

HPA uses average across all pods, calculated relative to requests (not limits):

Formula:
  desiredReplicas = currentReplicas × (currentMetricValue / targetMetricValue)

Example:
  Target: 70% CPU utilization
  Current replicas: 3
  Pod requests: 500m CPU each
  
  Pod 1 actual: 400m (80% of request)
  Pod 2 actual: 450m (90% of request)
  Pod 3 actual: 350m (70% of request)
  
  Average: (80 + 90 + 70) / 3 = 80%
  
  desiredReplicas = 3 × (80% / 70%) = 3.42 → 4 pods

Key: Utilization % = actual usage / request (NOT actual / limit)

HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

HPA Scaling Flow

┌─────────────────────────────────────────────────────────────────┐
│ 1. Metrics Server collects data (every 15s)                     │
│    Pod 1: 80%, Pod 2: 90%, Pod 3: 70%                           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│ 2. HPA checks metrics (every 15s)                               │
│    Average: 80% > Target: 70% → scale UP                        │
│    Desired = 3 × (80/70) = 4 pods                               │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│ 3. HPA updates Deployment replicas: 3 → 4                       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│ 4. Scheduler places new pod                                     │
│    If no node has capacity → Pod stays "Pending"                │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼ (if Pending)
┌─────────────────────────────────────────────────────────────────┐
│ 5. Cluster Autoscaler adds new EC2 node                         │
│    Pod scheduled on new node                                    │
└─────────────────────────────────────────────────────────────────┘

Vertical Pod Autoscaler (VPA)

Adjusts CPU/memory requests for existing pods. Makes pods bigger, not more numerous.

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"    # Auto-restart pods with new values
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi

Update Modes

ModeBehavior
OffRecommend only, no changes
InitialApply to new pods only
AutoRestart pods with new values

HPA vs VPA

VPA solves: "My pod always uses more memory than requested"
  Before: request=256Mi, actual=800Mi → OOMKilled
  After:  request=1Gi, actual=800Mi → stable

HPA solves: "Traffic increased, need more parallel processing"
  Before: 3 pods, 1000 req/s → each pod overloaded
  After:  30 pods, 1000 req/s → load distributed

Don’t use both on same metric - they conflict:

HPA: "CPU at 80%, add pods" → CPU drops to 40%
VPA: "CPU at 40%, lower request" → CPU jumps to 80%
→ Infinite loop

Safe combination: HPA on custom metrics (requests/sec), VPA on CPU/memory.


Cluster Autoscaler

Adds/removes EC2 nodes based on pending pods and node utilization.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│ Scale UP trigger:                                               │
│   Pod is Pending because no node has enough resources           │
│   → Increase ASG desired capacity                               │
│   → New EC2 launches, joins cluster                             │
│   → Pod scheduled on new node                                   │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ Scale DOWN trigger:                                             │
│   Node underutilized for 10+ minutes                            │
│   All pods can be moved to other nodes                          │
│   → Drain node (evict pods)                                     │
│   → Decrease ASG desired capacity                               │
│   → EC2 terminated                                              │
└─────────────────────────────────────────────────────────────────┘

Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
spec:
  template:
    spec:
      containers:
      - name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m

Karpenter (Alternative to Cluster Autoscaler)

Provisions EC2 instances directly (not via ASG). Picks optimal instance type per workload.

Karpenter vs Cluster Autoscaler

AspectCluster AutoscalerKarpenter
How it scalesAdjusts ASG sizeProvisions EC2 directly
Node typesFixed per node groupChooses best instance per pod
SpeedSlower (ASG → EC2)Faster (direct EC2 API)
Bin packingBasicSmart (fits pods efficiently)
Spot handlingManual setupBuilt-in with fallback
CostOften overprovisionsBetter right-sizing

Example

Pod needs 3 CPU, 2Gi memory

Cluster Autoscaler (node group = m5.xlarge only):
  → Launches m5.xlarge (4 CPU, 16Gi)
  → Wasted: 1 CPU, 14Gi
  → Cost: $0.192/hr

Karpenter (can choose from multiple types):
  → Picks optimal instance or bins multiple pods
  → Minimal waste
  → Or picks Spot: ~$0.06/hr

Karpenter Configuration

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m5.large", "m5.xlarge", "c5.large", "c5.xlarge"]
  limits:
    cpu: 100
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

Consolidation

After traffic drops:

Cluster Autoscaler:
  Node 1: 20% utilized
  Node 2: 30% utilized
  Node 3: 25% utilized
  → Keeps all 3 (none below threshold)

Karpenter:
  → "I can fit all pods on 1 larger node"
  → Consolidates to 1 node
  → Terminates 2 nodes

Typical savings: 20-50% EC2 cost reduction vs Cluster Autoscaler.


AWS Load Balancer Controller

Creates and manages ALB/NLB when you define Ingress or LoadBalancer Service.

Ingress → ALB

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-app-service
            port:
              number: 80

Service type LoadBalancer → NLB

apiVersion: v1
kind: Service
metadata:
  name: my-app-nlb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: external
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  type: LoadBalancer
  selector:
    app: my-app
  ports:
  - port: 443
    targetPort: 8080

Target Types

TypeHow traffic routes
ipDirect to Pod IP (recommended)
instanceTo Node, then kube-proxy routes to Pod

Why Requests Matter for Scheduling

Scheduler uses requests, not actual usage:

Node capacity: 4 CPU, 8Gi memory

Pod A requests: 1 CPU  ─┐
Pod B requests: 1 CPU   ├─ Total requested: 3 CPU
Pod C requests: 1 CPU  ─┘

Remaining for scheduling: 1 CPU

New pod wants: 2 CPU
→ CANNOT schedule (only 1 CPU available by request)
→ Even if actual usage is low!

Actual usage might be:
  Pod A: 0.3 CPU (30% of request)
  Pod B: 0.5 CPU (50% of request)
  Pod C: 0.2 CPU (20% of request)
  Total actual: 1 CPU

But scheduler only looks at REQUESTS.
This is why VPA is useful - right-sizes requests.

Autoscaling Summary

ComponentLooks atCompares toAction
HPAAverage actual usageRequests (as %)Add/remove pods
VPAIndividual pod usage over timeCurrent requestsAdjust request values
Cluster AutoscalerPending podsNode capacity (by requests)Add/remove nodes
KarpenterPending podsAvailable instance typesProvision optimal nodes
SchedulerPod requestsNode unrequested capacityPlace pods on nodes

Monitoring: CloudWatch vs AMP (Prometheus)

Two Approaches

CloudWatch method:
┌─────────────────────────────────────────────────────────────────┐
│ EKS Node                                                        │
│                                                                 │
│   ┌─────────────────┐                                           │
│   │ CloudWatch Agent│───────► CloudWatch                        │
│   │ (system metrics)│                                           │
│   └─────────────────┘                                           │
│                                                                 │
│   ┌─────────────────┐                                           │
│   │ App             │───────► CloudWatch                        │
│   │ (SDK push)      │         (two senders)                     │
│   └─────────────────┘                                           │
└─────────────────────────────────────────────────────────────────┘

Prometheus method:
┌─────────────────────────────────────────────────────────────────┐
│ EKS Node                                                        │
│                                                                 │
│   ┌─────────────────┐      scrape       ┌─────────────────┐     │
│   │ Prometheus Agent│◄──────────────────│ App :9090       │     │
│   │                 │◄──────────────────│ node-exporter   │     │
│   │                 │◄──────────────────│ kube-state-metrics    │
│   └────────┬────────┘                   └─────────────────┘     │
│            │                                                    │
│            │ one sender                                         │
│            ▼                                                    │
│   ┌──────────────┐                                              │
│   │ AMP          │                                              │
│   └──────────────┘                                              │
└─────────────────────────────────────────────────────────────────┘

Key Difference

CloudWatchPrometheus/AMP
Who sends to backendAgent + App (both)Agent only
App’s jobCall CloudWatch APIExpose HTTP /metrics endpoint
Network calls from appYes (to CloudWatch)No (agent scrapes locally)

Amazon Managed Service for Prometheus (AMP)

Fully managed Prometheus-compatible monitoring. You send metrics, AWS handles storage/scaling.

What is Prometheus?

Open-source monitoring system. Industry standard for Kubernetes monitoring.

System Metrics Collection

Prometheus agent scrapes exporters that expose /metrics:

┌─────────────────────────────────────────────────────────────────┐
│ EKS Node                                                        │
│                                                                 │
│   ┌─────────────────┐                                           │
│   │ node-exporter   │ ← CPU, memory, disk, network              │
│   │ :9100/metrics   │   (DaemonSet)                             │
│   └────────┬────────┘                                           │
│            │ scrape                                             │
│   ┌────────┴────────┐                                           │
│   │ Prometheus Agent│───────► AMP Workspace                     │
│   └────────┬────────┘                                           │
│            │ scrape                                             │
│   ┌────────┴────────┐                                           │
│   │ kube-state-     │ ← K8s object states (pods, deployments)   │
│   │ metrics         │                                           │
│   └─────────────────┘                                           │
└─────────────────────────────────────────────────────────────────┘

CloudWatch vs AMP

AspectCloudWatchAMP
Best forAWS servicesKubernetes/containers
Query languageCloudWatch InsightsPromQL (industry standard)
High-cardinalityExpensive at scaleDesigned for it
EcosystemAWS-native1000s of Prometheus exporters
PortabilityAWS onlySame queries work anywhere

When to Use AMP

  • Running EKS and want Prometheus-compatible monitoring
  • Need PromQL queries
  • High-cardinality metrics (e.g., per-customer metrics)
  • Want to reuse existing Prometheus dashboards
  • Multi-cloud/hybrid (same queries everywhere)

AWS X-Ray (Distributed Tracing)

What is a Trace?

Tracks a single request as it flows through multiple services.

User clicks "Buy"
         │
         ▼
┌─────────────────────────────────────────────────────────────────┐
│ Trace ID: abc-123                                               │
│                                                                 │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Span 1: API Gateway (50ms)                                  │ │
│ │ ├── Span 2: Order Service (200ms)                           │ │
│ │ │   ├── Span 3: Inventory Service (80ms)                    │ │
│ │ │   ├── Span 4: Payment Service (100ms)  ← bottleneck       │ │
│ │ │   └── Span 5: DynamoDB (20ms)                             │ │
│ └─────────────────────────────────────────────────────────────┘ │
│                                                                 │
│ Total: 250ms                                                    │
└─────────────────────────────────────────────────────────────────┘
  • Span = One unit of work (service call, DB query)
  • Trace = Collection of spans for one request

Metrics vs Traces

MetricsTraces
WhatAggregated numbersIndividual request paths
Question“How many errors per minute?”“Why was THIS request slow?”
GranularitySummary (avg, p99)Per-request detail
Metrics: "5% of requests are slow"
Traces:  "This slow request spent 2s waiting for DB"

X-Ray Service Map

Auto-generated visual of your architecture:

        ┌─────────┐
        │ API GW  │
        │ 99% OK  │
        └────┬────┘
             │
             ▼
        ┌─────────┐
        │ Lambda  │
        │ 95% OK  │ ← 5% errors visible
        └────┬────┘
             │
        ┌────┴────┐
        ▼         ▼
   ┌─────────┐ ┌─────────┐
   │ DynamoDB│ │ S3      │
   │ 15ms    │ │ 50ms    │
   └─────────┘ └─────────┘

When to Use X-Ray

  • Debug slow requests
  • Find errors in distributed systems
  • Understand service dependencies
  • Identify bottlenecks

ADOT (AWS Distro for OpenTelemetry)

AWS’s distribution of OpenTelemetry collector. Can replace Prometheus agent.

What it Does

┌─────────────────────────────────────────────────────────────────┐
│ ADOT Collector                                                  │
│                                                                 │
│   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐       │
│   │ Receivers   │────►│ Processors  │────►│ Exporters   │       │
│   │             │     │             │     │             │       │
│   │ - Prometheus│     │ - Filter    │     │ - AMP       │       │
│   │ - OTLP      │     │ - Transform │     │ - CloudWatch│       │
│   │ - StatsD    │     │             │     │ - X-Ray     │       │
│   └─────────────┘     └─────────────┘     └─────────────┘       │
└─────────────────────────────────────────────────────────────────┘

ADOT vs Prometheus Agent

AspectPrometheus AgentADOT
PurposeMetrics onlyMetrics + Traces + Logs
InputPrometheus onlyMany formats
OutputAMP onlyAMP, CloudWatch, X-Ray, etc.
Use caseSimple Prometheus setupMulti-destination, traces

When to Use ADOT

  • Need metrics AND traces (AMP + X-Ray)
  • Want to send same metrics to multiple destinations
  • Want vendor-neutral OpenTelemetry standard

Observability Summary

ToolData TypeUse For
CloudWatchMetrics, LogsAWS-native monitoring, simple setup
AMPMetricsPrometheus ecosystem, PromQL, K8s-native
X-RayTracesDebugging requests, finding bottlenecks
ADOTAllUnified collection, multi-destination

Common EKS setup: AMP for metrics + X-Ray for traces, collected via ADOT.