What is Kubernetes?
Container orchestration system. Automatically manages many containers across many servers.
Problems it solves
- Decide which server runs which container
- Restart crashed containers
- Scale up/down based on load
- Route traffic to healthy containers
- Roll out updates without downtime
Key Terms
| Term | Meaning |
|---|---|
| Cluster | Entire Kubernetes system (Control Plane + all Nodes) |
| Node | Server (physical/VM) that runs containers |
| Pod | Smallest deployable unit. One or more containers sharing storage/network |
| Control Plane | The “brain” that makes decisions (scheduling, monitoring, scaling) |
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ You tell Kubernetes: "I want 3 copies of my web app running" │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Control Plane (the brain) │ │
│ │ - Receives your request │ │
│ │ - Decides which servers have capacity │ │
│ │ - Schedules containers onto servers │ │
│ │ - Monitors health, restarts failed containers │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ │
│ │ │ Pod │ │ │ │ Pod │ │ │ │ Pod │ │ │
│ │ │ (web app) │ │ │ │ (web app) │ │ │ │ (web app) │ │ │
│ │ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
What is EKS (Elastic Kubernetes Service)?
AWS-managed Kubernetes Control Plane + your worker nodes
Problem it solves
Running Kubernetes yourself is complex:
- Managing Control Plane (API Server, etcd, Scheduler, Controller Manager)
- Handling upgrades
- Ensuring high availability
- Patching security vulnerabilities
EKS removes this burden.
┌─────────────────────────────────────────────────────────────┐
│ EKS Cluster │
├─────────────────────────┬───────────────────────────────────┤
│ Control Plane │ Data Plane │
│ (AWS manages) │ (You manage OR AWS) │
│ │ │
│ ┌─────────────────┐ │ ┌──────────────────────────┐ │
│ │ API Server │ │ │ Worker Nodes │ │
│ │ etcd │◄───┼───►│ (EC2 or Fargate) │ │
│ │ Scheduler │ │ │ │ │
│ │ Controller Mgr │ │ │ ┌─────┐ ┌─────┐ ┌─────┐ │ │
│ └─────────────────┘ │ │ │Pod A│ │Pod B│ │Pod C│ │ │
│ │ │ └─────┘ └─────┘ └─────┘ │ │
│ Runs in AWS-managed │ └──────────────────────────┘ │
│ VPC (hidden from you) │ │
└─────────────────────────┴───────────────────────────────────┘
ECS vs EKS Terminology Mapping
| ECS | EKS (Kubernetes) |
|---|---|
| Task Definition | Pod spec (in Deployment YAML) |
| Task | Pod |
| Service | Deployment + Service |
| Cluster | Cluster |
| Container Instance (EC2) | Node |
| Fargate | Fargate (same in both) |
Task (ECS) ≈ Pod (EKS)
Both are the smallest deployable unit.
| ECS Task | EKS Pod | |
|---|---|---|
| What it is | One or more containers running together | One or more containers running together |
| Share network? | Yes | Yes |
| Share storage? | Yes | Yes |
| Defined by | Task Definition (JSON) | Pod spec (YAML) |
Key difference: Pod is a Kubernetes concept (industry standard). Task is AWS-specific (ECS only).
Service (ECS) ≠ Node (EKS)
Completely different concepts.
| ECS Service | EKS Node | |
|---|---|---|
| What it is | Keeps N copies of a Task running | Server that runs Pods |
| Purpose | “I want 3 web servers always running” | Physical/virtual machine providing compute |
- ECS equivalent of Node = EC2 instance (or Fargate)
- EKS equivalent of ECS Service = Deployment + Service
ECS Service vs Kubernetes Deployment + Service
ECS: One concept does two things
┌─────────────────────────────────────────────────────────┐
│ ECS Service │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ "Keep 3 Tasks running" │ │
│ │ "Register them with ALB target group" │ │
│ │ "Replace unhealthy Tasks" │ │
│ │ "Rolling update when Task Definition changes" │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Task 1│ │Task 2│ │Task 3│ │
│ └──────┘ └──────┘ └──────┘ │
└─────────────────────────────────────────────────────────┘
Kubernetes: Two separate concepts
Deployment = Pod management
- “Keep 3 Pods running”
- “Rolling update when Pod spec changes”
- “Replace crashed Pods”
Service = Network management
- “Give these Pods a stable IP/DNS name”
- “Load balance traffic across Pods”
- “Track which Pods are healthy”
┌─────────────────────────────────────────────────────────────────┐
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Deployment │ │
│ │ "Keep 3 Pods running with this container image" │ │
│ │ │ │
│ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │
│ │ │Pod 1 │ │Pod 2 │ │Pod 3 │ │ │
│ │ └──┬───┘ └──┬───┘ └──┬───┘ │ │
│ └─────────┼──────────────┼──────────────┼──────────────────┘ │
│ └──────────────┼──────────────┘ │
│ │ │
│ ┌────────────────────────┼────────────────────────────────┐ │
│ │ Service │ │
│ │ "Expose these Pods on cluster IP 10.0.0.50:80" │ │
│ │ "DNS name: my-app.default.svc.cluster.local" │ │
│ │ "Load balance incoming requests across all Pods" │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Why does Kubernetes split them?
Flexibility. You can mix and match:
| Scenario | What you create |
|---|---|
| Internal microservice | Deployment + Service (ClusterIP) |
| Public web app | Deployment + Service (LoadBalancer) |
| Background worker (no network needed) | Deployment only (no Service) |
| Expose existing external DB | Service only (no Deployment) |
| Canary deployment | 2 Deployments + 1 Service |
Kubernetes Service Types
| Type | Accessible from | Use case |
|---|---|---|
| ClusterIP | Inside cluster only | Pod-to-Pod communication |
| NodePort | Each Node’s IP at static port | Rarely used |
| LoadBalancer | Internet | External users accessing your app |
Multiple Pods/Tasks per Node/EC2
Normal case. One Node/EC2 instance runs multiple Pods/Tasks.
┌─────────────────────────────────────────────────────────────┐
│ EC2 Instance (Node) │
│ (e.g., m5.large: 2 vCPU, 8GB RAM) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Pod A │ │ Pod B │ │ Pod C │ │
│ │ (web app) │ │ (api) │ │ (worker) │ │
│ │ 256MB RAM │ │ 512MB RAM │ │ 1GB RAM │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Total used: ~1.8GB of 8GB available │
│ Room for more Pods until resources exhausted │
└─────────────────────────────────────────────────────────────┘
How scheduler decides placement
- Available resources - Does the Node have enough CPU/memory?
- Constraints - Does the Pod require specific Node type (GPU, etc.)?
- Spreading - Avoid putting all copies on same Node (fault tolerance)
How ClusterIP Service Works
Frontend Pod calling API Pod:
┌─────────────────────────────────────────────────────────────────┐
│ EKS Cluster │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Deployment: frontend (3 Pods) │ │
│ │ calls: http://api-service:8080/users │ │
│ └───────────────────────────┬──────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Service: api-service (ClusterIP) │ │
│ │ IP: 10.100.50.25 (internal only) │ │
│ │ DNS: api-service.default.svc.cluster.local │ │
│ └───────────────────────────┬──────────────────────────────┘ │
│ │ load balances │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Deployment: api (5 Pods) │ │
│ │ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │ │
│ │ │Pod │ │Pod │ │Pod │ │Pod │ │Pod │ │ │
│ │ │:8080 │ │:8080 │ │:8080 │ │:8080 │ │:8080 │ │ │
│ │ └───────┘ └───────┘ └───────┘ └───────┘ └───────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
What is ClusterIP?
Virtual IP. No actual network interface has this IP. It only exists in iptables rules.
Packet flow
1. Frontend Pod sends HTTP request to api-service:8080
2. DNS resolves: api-service → 10.100.50.25 (ClusterIP)
3. Packet sent: dst: 10.100.50.25:8080
4. Node's iptables intercepts packet
5. iptables rewrites destination:
10.100.50.25:8080 → 10.0.2.47:8080 (actual Pod IP)
6. Packet routed to API Pod
kube-proxy
Component running on every Node that manages iptables rules.
┌─────────────────────────────────────────────────────────────┐
│ Control Plane │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ API Server │ │
│ │ Knows: │ │
│ │ - Service "api-service" has ClusterIP 10.100.50.25 │ │
│ │ - Pods behind it: 10.0.2.47, 10.0.2.48, 10.0.3.22 │ │
│ └───────────────────────┬──────────────────────────────┘ │
└──────────────────────────┼──────────────────────────────────┘
│ watches for changes
▼
┌──────────────────────────────────────────────────────────────┐
│ Node 1 │ Node 2 │
│ ┌─────────────────┐ │ ┌─────────────────┐ │
│ │ kube-proxy │ │ │ kube-proxy │ │
│ │ Updates local │ │ │ Updates local │ │
│ │ iptables rules │ │ │ iptables rules │ │
│ └─────────────────┘ │ └─────────────────┘ │
└──────────────────────────┴───────────────────────────────────┘
When Pods change
API Pod 10.0.2.48 crashes
│
▼
Control Plane detects Pod gone
│
▼
API Server updates Endpoints:
api-service → [10.0.2.47, 10.0.3.22] (removed .48)
│
▼
kube-proxy on all Nodes sees change
│
▼
iptables rules updated:
10.100.50.25 → [10.0.2.47, 10.0.3.22]
New requests never go to dead Pod.
Why Use Containers?
Problems without containers
Problem 1: “Works on my machine”
Developer's laptop: Production server:
- Python 3.11 - Python 3.8
- Library v2.1 - Library v1.9
App works locally → Crashes in production
Problem 2: Dependency conflicts
One EC2 running two apps:
App A needs: Python 3.8, OpenSSL 1.1
App B needs: Python 3.11, OpenSSL 3.0
They conflict. Need separate servers = more cost.
Problem 3: Slow deployment
Traditional deployment:
1. SSH into server
2. Stop old app
3. Pull new code
4. Install dependencies (5-10 minutes)
5. Start new app
6. Hope it works
If it fails → rollback is painful
Problem 4: Slow scaling
Traffic spike → need more servers
1. Launch new EC2 (2-3 minutes)
2. Install OS packages
3. Install app dependencies
4. Configure app
5. Start app
6. Register with load balancer
Total: 5-10 minutes. Users already left.
Solutions with containers
Solution 1: Package everything together
Container image includes:
- Your app code
- Exact Python version
- Exact library versions
- Exact OS libraries
Same image runs identically everywhere:
laptop = staging = production
Solution 2: Isolation
One EC2 running two containers:
┌─────────────────────────────────────────┐
│ EC2 Instance │
│ ┌─────────────────┐ ┌─────────────────┐│
│ │ Container A │ │ Container B ││
│ │ Python 3.8 │ │ Python 3.11 ││
│ │ OpenSSL 1.1 │ │ OpenSSL 3.0 ││
│ │ (isolated) │ │ (isolated) ││
│ └─────────────────┘ └─────────────────┘│
└─────────────────────────────────────────┘
No conflicts. Both run on same server.
Solution 3: Fast deployment
Container deployment:
1. Pull new image (already built, seconds)
2. Start new container
3. Health check passes
4. Route traffic to new container
5. Stop old container
Rollback = start old image (seconds)
Solution 4: Fast scaling
Traffic spike → need more containers
1. Container image already exists
2. Start new container (seconds)
3. Health check passes
4. Route traffic
Total: 10-30 seconds
ECS vs EKS: When to Use
| ECS | EKS | |
|---|---|---|
| System | AWS-proprietary | Industry-standard Kubernetes |
| Complexity | Simpler, fewer concepts | More complex, more features |
| Skills | AWS-only | Portable (any cloud/on-prem) |
| AWS integration | Tighter | Large ecosystem (Helm, Istio, etc.) |
| Configuration | Less | More configuration, more control |
Use ECS when:
- Simpler container workloads
- Team doesn’t know Kubernetes
- Want less operational overhead
- Only using AWS
Use EKS when:
- Team already knows Kubernetes
- Need Kubernetes-specific features
- Want portability across clouds
- Need ecosystem tools (Helm, service mesh, etc.)
Node Management: ECS vs EKS
ECS Node Management
Two modes:
| Mode | Who manages EC2? |
|---|---|
| EC2 Launch Type | You (via ASG) |
| Fargate | AWS (no EC2 to manage) |
EC2 Launch Type: You create EC2 instances with ECS Agent pre-installed (ECS-optimized AMI). Agent auto-registers to cluster.
┌─────────────────────────────────────────────────────────────┐
│ EC2 Instance │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ECS Agent │ │
│ │ - Registers with ECS Control Plane │ │
│ │ - Reports available CPU/memory │ │
│ │ - Receives "Run this Task" commands │ │
│ │ - Starts/stops containers │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
EKS Node Management
Three modes:
| Mode | Who manages EC2? | Who joins to cluster? |
|---|---|---|
| Self-Managed Nodes | You (via ASG) | You (bootstrap script) |
| Managed Node Groups | AWS | AWS |
| Fargate | AWS | N/A |
Self-Managed: You create EC2 with EKS-optimized AMI and bootstrap script. kubelet auto-registers to API Server.
┌─────────────────────────────────────────────────────────────┐
│ EC2 Instance (Node) │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ kubelet │ │
│ │ - Registers with Kubernetes API Server │ │
│ │ - Reports Node status │ │
│ │ - Receives "Run this Pod" commands │ │
│ │ - Manages Pod lifecycle │ │
│ └─────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ kube-proxy │ │
│ │ - Manages iptables rules for Service routing │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Managed Node Groups: You specify instance type and count. AWS handles provisioning, joining, and rolling updates.
Using ASG with ECS/EKS
ASG manages EC2 instance count. Registration to cluster is automatic via User Data script.
ECS + ASG Setup
1. Create ECS Cluster (name only)
2. Create Launch Template:
- AMI: ECS-optimized AMI
- IAM Role: ecsInstanceRole
- User Data:
#!/bin/bash
echo "ECS_CLUSTER=my-cluster" >> /etc/ecs/ecs.config
3. Create ASG with Launch Template
- No Target Group needed
- Instances auto-register to ECS cluster
EKS + ASG Setup (Self-Managed)
1. Create EKS Cluster
2. Create IAM Role for Nodes:
- AmazonEKSWorkerNodePolicy
- AmazonEC2ContainerRegistryReadOnly
- AmazonEKS_CNI_Policy
3. Update aws-auth ConfigMap (allow IAM role)
4. Create Launch Template:
- AMI: EKS-optimized AMI
- IAM Role: (from step 2)
- User Data:
#!/bin/bash
/etc/eks/bootstrap.sh my-eks-cluster
5. Create ASG with Launch Template
- No Target Group needed
- Instances auto-register as Nodes
Key difference: EKS requires aws-auth ConfigMap update for authorization.
Where is the Kubernetes API Server?
In EKS: AWS manages it in their own VPC. You don’t see the EC2 instances running it.
┌─────────────────────────────────────────────────────────────┐
│ AWS-Managed VPC │
│ (hidden from you) │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ EKS Control Plane │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ API Server │ │ API Server │ │ API Server │ │ │
│ │ │ (HA) │ │ (HA) │ │ (HA) │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ │ │ │
│ │ ┌────────────┐ │ │
│ │ │ etcd │ (stores cluster state) │ │
│ │ └────────────┘ │ │
│ │ │ │
│ │ Endpoint: https://XXXXX.eks.amazonaws.com │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
│ HTTPS
▼
┌─────────────────────────────────────────────────────────────┐
│ Your VPC │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Node │ │ Node │ │ Node │ │
│ │ (kubelet │ │ (kubelet │ │ (kubelet │ │
│ │ talks to │ │ talks to │ │ talks to │ │
│ │ API Server) │ │ API Server) │ │ API Server) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ kubectl commands also go to API Server endpoint │
└─────────────────────────────────────────────────────────────┘
Endpoint access options:
- Public endpoint (accessible from internet)
- Private endpoint (only from within VPC)
- Both
ASG vs Target Group
These are independent concepts.
| Concept | Purpose | Required for cluster? |
|---|---|---|
| ASG | Scale EC2 instance count | Optional (can use Managed Node Groups or Fargate) |
| Target Group | Route ALB/NLB traffic to targets | Only if you need load balancer |
ASG: Manages Node count
ASG: "Keep 3-10 EC2 instances running"
Nothing to do with traffic routing.
Target Group: Routes traffic to Pods/Tasks
ALB → Target Group → Pod/Task IPs
Who registers Pods to Target Group?
- ECS: ECS Service (automatic)
- EKS: AWS Load Balancer Controller (you install)
In container workloads: Traffic goes to Pods/Tasks, not EC2 instances. So Target Group contains Pod IPs, not Node IPs.
What is ConfigMap?
Kubernetes object storing configuration as key-value pairs. Pods can read this data as environment variables or files.
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
DATABASE_HOST: "db.example.com"
LOG_LEVEL: "info"
Purpose: Separate configuration from container image. Change config without rebuilding.
What is aws-auth ConfigMap?
Special ConfigMap that maps AWS IAM identities to Kubernetes permissions.
Kubernetes doesn’t understand IAM. aws-auth bridges them.
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: arn:aws:iam::123456789:role/my-node-role
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
Why Nodes need it
1. New EC2 Node starts, kubelet calls API Server: "Register me"
2. kubelet authenticates using EC2's IAM Role
3. API Server checks aws-auth ConfigMap:
"Is this IAM Role allowed to be a Node?"
4. If IAM Role is in aws-auth → Node joins cluster
If not → Rejected
Key groups
| Group | Permission |
|---|---|
| system:nodes | Allows Node operations |
| system:masters | Full admin access |
Fargate Pricing vs EC2
Fargate costs more per compute unit, but can be cheaper overall.
When Fargate is cheaper
- Variable/bursty workloads
- Low utilization
- Short-running tasks
- Don’t want to manage nodes
When EC2 is cheaper
- Steady 24/7 workloads
- High utilization (>70%)
- Can use Reserved Instances or Spot
Example comparison (1 vCPU, 2GB)
| Fargate | EC2 m5.large (2 vCPU, 8GB) | |
|---|---|---|
| Hourly | ~$0.05 | ~$0.10 |
| 8 hours/day, 30 days | $12 | $72 (24/7) or $24 (8hr/day) |
Fargate wins at low utilization. EC2 wins at high utilization.
EKS Autoscaling Components
Overview
| Component | What it does |
|---|---|
| Horizontal Pod Autoscaler (HPA) | Adds/removes pod replicas based on CPU, memory, or custom metrics |
| Vertical Pod Autoscaler (VPA) | Adjusts CPU/memory requests for existing pods |
| Cluster Autoscaler | Adds/removes EC2 nodes when pods can’t be scheduled |
| Karpenter | Alternative to Cluster Autoscaler - provisions optimal EC2 directly |
| AWS Load Balancer Controller | Creates ALB/NLB when you define Ingress or LoadBalancer Service |
How They Work Together
Traffic increases
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AWS Load Balancer Controller │
│ Creates/manages ALB or NLB to route traffic to pods │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Horizontal Pod Autoscaler (HPA) │
│ "CPU at 80%? Add more pod replicas" │
│ Scales: 3 pods → 10 pods │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Cluster Autoscaler / Karpenter │
│ "10 pods need scheduling but nodes are full? Add EC2 nodes" │
│ Scales: 2 nodes → 5 nodes │
└─────────────────────────────────────────────────────────────────┘
Resource Requests and Limits
Every pod specifies how much CPU/memory it needs:
containers:
- name: my-app
resources:
requests: # Guaranteed minimum - scheduler uses this
cpu: 500m # 500 millicores = 0.5 CPU
memory: 256Mi
limits: # Maximum allowed - killed if exceeds memory
cpu: 1000m
memory: 512Mi
| Term | What it means |
|---|---|
| Request | “I need at least this much” - used for scheduling |
| Limit | “Never give me more than this” - enforced at runtime |
| Actual usage | What the container is really using right now |
Horizontal Pod Autoscaler (HPA)
Adds/removes pod replicas based on metrics.
How HPA Calculates
HPA uses average across all pods, calculated relative to requests (not limits):
Formula:
desiredReplicas = currentReplicas × (currentMetricValue / targetMetricValue)
Example:
Target: 70% CPU utilization
Current replicas: 3
Pod requests: 500m CPU each
Pod 1 actual: 400m (80% of request)
Pod 2 actual: 450m (90% of request)
Pod 3 actual: 350m (70% of request)
Average: (80 + 90 + 70) / 3 = 80%
desiredReplicas = 3 × (80% / 70%) = 3.42 → 4 pods
Key: Utilization % = actual usage / request (NOT actual / limit)
HPA Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
HPA Scaling Flow
┌─────────────────────────────────────────────────────────────────┐
│ 1. Metrics Server collects data (every 15s) │
│ Pod 1: 80%, Pod 2: 90%, Pod 3: 70% │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 2. HPA checks metrics (every 15s) │
│ Average: 80% > Target: 70% → scale UP │
│ Desired = 3 × (80/70) = 4 pods │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 3. HPA updates Deployment replicas: 3 → 4 │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 4. Scheduler places new pod │
│ If no node has capacity → Pod stays "Pending" │
└─────────────────────────────────────────────────────────────────┘
│
▼ (if Pending)
┌─────────────────────────────────────────────────────────────────┐
│ 5. Cluster Autoscaler adds new EC2 node │
│ Pod scheduled on new node │
└─────────────────────────────────────────────────────────────────┘
Vertical Pod Autoscaler (VPA)
Adjusts CPU/memory requests for existing pods. Makes pods bigger, not more numerous.
VPA Configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto" # Auto-restart pods with new values
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 8Gi
Update Modes
| Mode | Behavior |
|---|---|
| Off | Recommend only, no changes |
| Initial | Apply to new pods only |
| Auto | Restart pods with new values |
HPA vs VPA
VPA solves: "My pod always uses more memory than requested"
Before: request=256Mi, actual=800Mi → OOMKilled
After: request=1Gi, actual=800Mi → stable
HPA solves: "Traffic increased, need more parallel processing"
Before: 3 pods, 1000 req/s → each pod overloaded
After: 30 pods, 1000 req/s → load distributed
Don’t use both on same metric - they conflict:
HPA: "CPU at 80%, add pods" → CPU drops to 40%
VPA: "CPU at 40%, lower request" → CPU jumps to 80%
→ Infinite loop
Safe combination: HPA on custom metrics (requests/sec), VPA on CPU/memory.
Cluster Autoscaler
Adds/removes EC2 nodes based on pending pods and node utilization.
How It Works
┌─────────────────────────────────────────────────────────────────┐
│ Scale UP trigger: │
│ Pod is Pending because no node has enough resources │
│ → Increase ASG desired capacity │
│ → New EC2 launches, joins cluster │
│ → Pod scheduled on new node │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Scale DOWN trigger: │
│ Node underutilized for 10+ minutes │
│ All pods can be moved to other nodes │
│ → Drain node (evict pods) │
│ → Decrease ASG desired capacity │
│ → EC2 terminated │
└─────────────────────────────────────────────────────────────────┘
Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
spec:
template:
spec:
containers:
- name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
Karpenter (Alternative to Cluster Autoscaler)
Provisions EC2 instances directly (not via ASG). Picks optimal instance type per workload.
Karpenter vs Cluster Autoscaler
| Aspect | Cluster Autoscaler | Karpenter |
|---|---|---|
| How it scales | Adjusts ASG size | Provisions EC2 directly |
| Node types | Fixed per node group | Chooses best instance per pod |
| Speed | Slower (ASG → EC2) | Faster (direct EC2 API) |
| Bin packing | Basic | Smart (fits pods efficiently) |
| Spot handling | Manual setup | Built-in with fallback |
| Cost | Often overprovisions | Better right-sizing |
Example
Pod needs 3 CPU, 2Gi memory
Cluster Autoscaler (node group = m5.xlarge only):
→ Launches m5.xlarge (4 CPU, 16Gi)
→ Wasted: 1 CPU, 14Gi
→ Cost: $0.192/hr
Karpenter (can choose from multiple types):
→ Picks optimal instance or bins multiple pods
→ Minimal waste
→ Or picks Spot: ~$0.06/hr
Karpenter Configuration
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge", "c5.large", "c5.xlarge"]
limits:
cpu: 100
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
Consolidation
After traffic drops:
Cluster Autoscaler:
Node 1: 20% utilized
Node 2: 30% utilized
Node 3: 25% utilized
→ Keeps all 3 (none below threshold)
Karpenter:
→ "I can fit all pods on 1 larger node"
→ Consolidates to 1 node
→ Terminates 2 nodes
Typical savings: 20-50% EC2 cost reduction vs Cluster Autoscaler.
AWS Load Balancer Controller
Creates and manages ALB/NLB when you define Ingress or LoadBalancer Service.
Ingress → ALB
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
spec:
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app-service
port:
number: 80
Service type LoadBalancer → NLB
apiVersion: v1
kind: Service
metadata:
name: my-app-nlb
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- port: 443
targetPort: 8080
Target Types
| Type | How traffic routes |
|---|---|
ip | Direct to Pod IP (recommended) |
instance | To Node, then kube-proxy routes to Pod |
Why Requests Matter for Scheduling
Scheduler uses requests, not actual usage:
Node capacity: 4 CPU, 8Gi memory
Pod A requests: 1 CPU ─┐
Pod B requests: 1 CPU ├─ Total requested: 3 CPU
Pod C requests: 1 CPU ─┘
Remaining for scheduling: 1 CPU
New pod wants: 2 CPU
→ CANNOT schedule (only 1 CPU available by request)
→ Even if actual usage is low!
Actual usage might be:
Pod A: 0.3 CPU (30% of request)
Pod B: 0.5 CPU (50% of request)
Pod C: 0.2 CPU (20% of request)
Total actual: 1 CPU
But scheduler only looks at REQUESTS.
This is why VPA is useful - right-sizes requests.
Autoscaling Summary
| Component | Looks at | Compares to | Action |
|---|---|---|---|
| HPA | Average actual usage | Requests (as %) | Add/remove pods |
| VPA | Individual pod usage over time | Current requests | Adjust request values |
| Cluster Autoscaler | Pending pods | Node capacity (by requests) | Add/remove nodes |
| Karpenter | Pending pods | Available instance types | Provision optimal nodes |
| Scheduler | Pod requests | Node unrequested capacity | Place pods on nodes |
Monitoring: CloudWatch vs AMP (Prometheus)
Two Approaches
CloudWatch method:
┌─────────────────────────────────────────────────────────────────┐
│ EKS Node │
│ │
│ ┌─────────────────┐ │
│ │ CloudWatch Agent│───────► CloudWatch │
│ │ (system metrics)│ │
│ └─────────────────┘ │
│ │
│ ┌─────────────────┐ │
│ │ App │───────► CloudWatch │
│ │ (SDK push) │ (two senders) │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Prometheus method:
┌─────────────────────────────────────────────────────────────────┐
│ EKS Node │
│ │
│ ┌─────────────────┐ scrape ┌─────────────────┐ │
│ │ Prometheus Agent│◄──────────────────│ App :9090 │ │
│ │ │◄──────────────────│ node-exporter │ │
│ │ │◄──────────────────│ kube-state-metrics │
│ └────────┬────────┘ └─────────────────┘ │
│ │ │
│ │ one sender │
│ ▼ │
│ ┌──────────────┐ │
│ │ AMP │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Key Difference
| CloudWatch | Prometheus/AMP | |
|---|---|---|
| Who sends to backend | Agent + App (both) | Agent only |
| App’s job | Call CloudWatch API | Expose HTTP /metrics endpoint |
| Network calls from app | Yes (to CloudWatch) | No (agent scrapes locally) |
Amazon Managed Service for Prometheus (AMP)
Fully managed Prometheus-compatible monitoring. You send metrics, AWS handles storage/scaling.
What is Prometheus?
Open-source monitoring system. Industry standard for Kubernetes monitoring.
System Metrics Collection
Prometheus agent scrapes exporters that expose /metrics:
┌─────────────────────────────────────────────────────────────────┐
│ EKS Node │
│ │
│ ┌─────────────────┐ │
│ │ node-exporter │ ← CPU, memory, disk, network │
│ │ :9100/metrics │ (DaemonSet) │
│ └────────┬────────┘ │
│ │ scrape │
│ ┌────────┴────────┐ │
│ │ Prometheus Agent│───────► AMP Workspace │
│ └────────┬────────┘ │
│ │ scrape │
│ ┌────────┴────────┐ │
│ │ kube-state- │ ← K8s object states (pods, deployments) │
│ │ metrics │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
CloudWatch vs AMP
| Aspect | CloudWatch | AMP |
|---|---|---|
| Best for | AWS services | Kubernetes/containers |
| Query language | CloudWatch Insights | PromQL (industry standard) |
| High-cardinality | Expensive at scale | Designed for it |
| Ecosystem | AWS-native | 1000s of Prometheus exporters |
| Portability | AWS only | Same queries work anywhere |
When to Use AMP
- Running EKS and want Prometheus-compatible monitoring
- Need PromQL queries
- High-cardinality metrics (e.g., per-customer metrics)
- Want to reuse existing Prometheus dashboards
- Multi-cloud/hybrid (same queries everywhere)
AWS X-Ray (Distributed Tracing)
What is a Trace?
Tracks a single request as it flows through multiple services.
User clicks "Buy"
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Trace ID: abc-123 │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Span 1: API Gateway (50ms) │ │
│ │ ├── Span 2: Order Service (200ms) │ │
│ │ │ ├── Span 3: Inventory Service (80ms) │ │
│ │ │ ├── Span 4: Payment Service (100ms) ← bottleneck │ │
│ │ │ └── Span 5: DynamoDB (20ms) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Total: 250ms │
└─────────────────────────────────────────────────────────────────┘
- Span = One unit of work (service call, DB query)
- Trace = Collection of spans for one request
Metrics vs Traces
| Metrics | Traces | |
|---|---|---|
| What | Aggregated numbers | Individual request paths |
| Question | “How many errors per minute?” | “Why was THIS request slow?” |
| Granularity | Summary (avg, p99) | Per-request detail |
Metrics: "5% of requests are slow"
Traces: "This slow request spent 2s waiting for DB"
X-Ray Service Map
Auto-generated visual of your architecture:
┌─────────┐
│ API GW │
│ 99% OK │
└────┬────┘
│
▼
┌─────────┐
│ Lambda │
│ 95% OK │ ← 5% errors visible
└────┬────┘
│
┌────┴────┐
▼ ▼
┌─────────┐ ┌─────────┐
│ DynamoDB│ │ S3 │
│ 15ms │ │ 50ms │
└─────────┘ └─────────┘
When to Use X-Ray
- Debug slow requests
- Find errors in distributed systems
- Understand service dependencies
- Identify bottlenecks
ADOT (AWS Distro for OpenTelemetry)
AWS’s distribution of OpenTelemetry collector. Can replace Prometheus agent.
What it Does
┌─────────────────────────────────────────────────────────────────┐
│ ADOT Collector │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Receivers │────►│ Processors │────►│ Exporters │ │
│ │ │ │ │ │ │ │
│ │ - Prometheus│ │ - Filter │ │ - AMP │ │
│ │ - OTLP │ │ - Transform │ │ - CloudWatch│ │
│ │ - StatsD │ │ │ │ - X-Ray │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
ADOT vs Prometheus Agent
| Aspect | Prometheus Agent | ADOT |
|---|---|---|
| Purpose | Metrics only | Metrics + Traces + Logs |
| Input | Prometheus only | Many formats |
| Output | AMP only | AMP, CloudWatch, X-Ray, etc. |
| Use case | Simple Prometheus setup | Multi-destination, traces |
When to Use ADOT
- Need metrics AND traces (AMP + X-Ray)
- Want to send same metrics to multiple destinations
- Want vendor-neutral OpenTelemetry standard
Observability Summary
| Tool | Data Type | Use For |
|---|---|---|
| CloudWatch | Metrics, Logs | AWS-native monitoring, simple setup |
| AMP | Metrics | Prometheus ecosystem, PromQL, K8s-native |
| X-Ray | Traces | Debugging requests, finding bottlenecks |
| ADOT | All | Unified collection, multi-destination |
Common EKS setup: AMP for metrics + X-Ray for traces, collected via ADOT.