EKS & Kubernetes Basics - Difference from ECS

What is Kubernetes?

Container orchestration system. Automatically manages many containers across many servers.

Problems it solves

Decide which server runs which container
Restart crashed containers
Scale up/down based on load
Route traffic to healthy containers
Roll out updates without downtime

Key Terms

Term	Meaning
Cluster	Entire Kubernetes system (Control Plane + all Nodes)
Node	Server (physical/VM) that runs containers
Pod	Smallest deployable unit. One or more containers sharing storage/network
Control Plane	The “brain” that makes decisions (scheduling, monitoring, scaling)

┌─────────────────────────────────────────────────────────────────┐
│                     Kubernetes Cluster                          │
│                                                                 │
│  You tell Kubernetes: "I want 3 copies of my web app running"  │
│                              │                                  │
│                              ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Control Plane (the brain)                   │   │
│  │  - Receives your request                                 │   │
│  │  - Decides which servers have capacity                   │   │
│  │  - Schedules containers onto servers                     │   │
│  │  - Monitors health, restarts failed containers           │   │
│  └─────────────────────────────────────────────────────────┘   │
│                              │                                  │
│              ┌───────────────┼───────────────┐                  │
│              ▼               ▼               ▼                  │
│  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐   │
│  │   Node 1        │ │   Node 2        │ │   Node 3        │   │
│  │  ┌───────────┐  │ │  ┌───────────┐  │ │  ┌───────────┐  │   │
│  │  │  Pod      │  │ │  │  Pod      │  │ │  │  Pod      │  │   │
│  │  │ (web app) │  │ │  │ (web app) │  │ │  │ (web app) │  │   │
│  │  └───────────┘  │ │  └───────────┘  │ │  └───────────┘  │   │
│  └─────────────────┘ └─────────────────┘ └─────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

What is EKS (Elastic Kubernetes Service)?

AWS-managed Kubernetes Control Plane + your worker nodes

Problem it solves

Running Kubernetes yourself is complex:

Managing Control Plane (API Server, etcd, Scheduler, Controller Manager)
Handling upgrades
Ensuring high availability
Patching security vulnerabilities

EKS removes this burden.

┌─────────────────────────────────────────────────────────────┐
│                        EKS Cluster                          │
├─────────────────────────┬───────────────────────────────────┤
│   Control Plane         │         Data Plane                │
│   (AWS manages)         │         (You manage OR AWS)       │
│                         │                                   │
│  ┌─────────────────┐    │    ┌──────────────────────────┐   │
│  │ API Server      │    │    │ Worker Nodes             │   │
│  │ etcd            │◄───┼───►│ (EC2 or Fargate)         │   │
│  │ Scheduler       │    │    │                          │   │
│  │ Controller Mgr  │    │    │  ┌─────┐ ┌─────┐ ┌─────┐ │   │
│  └─────────────────┘    │    │  │Pod A│ │Pod B│ │Pod C│ │   │
│                         │    │  └─────┘ └─────┘ └─────┘ │   │
│  Runs in AWS-managed    │    └──────────────────────────┘   │
│  VPC (hidden from you)  │                                   │
└─────────────────────────┴───────────────────────────────────┘

ECS vs EKS Terminology Mapping

ECS	EKS (Kubernetes)
Task Definition	Pod spec (in Deployment YAML)
Task	Pod
Service	Deployment + Service
Cluster	Cluster
Container Instance (EC2)	Node
Fargate	Fargate (same in both)

Task (ECS) ≈ Pod (EKS)

Both are the smallest deployable unit.

	ECS Task	EKS Pod
What it is	One or more containers running together	One or more containers running together
Share network?	Yes	Yes
Share storage?	Yes	Yes
Defined by	Task Definition (JSON)	Pod spec (YAML)

Key difference: Pod is a Kubernetes concept (industry standard). Task is AWS-specific (ECS only).

Service (ECS) ≠ Node (EKS)

Completely different concepts.

	ECS Service	EKS Node
What it is	Keeps N copies of a Task running	Server that runs Pods
Purpose	“I want 3 web servers always running”	Physical/virtual machine providing compute

ECS equivalent of Node = EC2 instance (or Fargate)
EKS equivalent of ECS Service = Deployment + Service

ECS Service vs Kubernetes Deployment + Service

ECS: One concept does two things

┌─────────────────────────────────────────────────────────┐
│                    ECS Service                          │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │  "Keep 3 Tasks running"                          │   │
│  │  "Register them with ALB target group"           │   │
│  │  "Replace unhealthy Tasks"                       │   │
│  │  "Rolling update when Task Definition changes"   │   │
│  └─────────────────────────────────────────────────┘   │
│                          │                              │
│           ┌──────────────┼──────────────┐              │
│           ▼              ▼              ▼              │
│       ┌──────┐       ┌──────┐       ┌──────┐          │
│       │Task 1│       │Task 2│       │Task 3│          │
│       └──────┘       └──────┘       └──────┘          │
└─────────────────────────────────────────────────────────┘

Kubernetes: Two separate concepts

Deployment = Pod management

“Keep 3 Pods running”
“Rolling update when Pod spec changes”
“Replace crashed Pods”

Service = Network management

“Give these Pods a stable IP/DNS name”
“Load balance traffic across Pods”
“Track which Pods are healthy”

┌─────────────────────────────────────────────────────────────────┐
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Deployment                            │   │
│  │  "Keep 3 Pods running with this container image"         │   │
│  │                                                          │   │
│  │      ┌──────┐       ┌──────┐       ┌──────┐             │   │
│  │      │Pod 1 │       │Pod 2 │       │Pod 3 │             │   │
│  │      └──┬───┘       └──┬───┘       └──┬───┘             │   │
│  └─────────┼──────────────┼──────────────┼──────────────────┘   │
│            └──────────────┼──────────────┘                      │
│                           │                                     │
│  ┌────────────────────────┼────────────────────────────────┐   │
│  │                    Service                               │   │
│  │  "Expose these Pods on cluster IP 10.0.0.50:80"         │   │
│  │  "DNS name: my-app.default.svc.cluster.local"           │   │
│  │  "Load balance incoming requests across all Pods"        │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Why does Kubernetes split them?

Flexibility. You can mix and match:

Scenario	What you create
Internal microservice	Deployment + Service (ClusterIP)
Public web app	Deployment + Service (LoadBalancer)
Background worker (no network needed)	Deployment only (no Service)
Expose existing external DB	Service only (no Deployment)
Canary deployment	2 Deployments + 1 Service

Kubernetes Service Types

Type	Accessible from	Use case
ClusterIP	Inside cluster only	Pod-to-Pod communication
NodePort	Each Node’s IP at static port	Rarely used
LoadBalancer	Internet	External users accessing your app

Multiple Pods/Tasks per Node/EC2

Normal case. One Node/EC2 instance runs multiple Pods/Tasks.

┌─────────────────────────────────────────────────────────────┐
│                   EC2 Instance (Node)                        │
│                   (e.g., m5.large: 2 vCPU, 8GB RAM)         │
│                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
│  │   Pod A     │  │   Pod B     │  │   Pod C     │          │
│  │  (web app)  │  │  (api)      │  │  (worker)   │          │
│  │  256MB RAM  │  │  512MB RAM  │  │  1GB RAM    │          │
│  └─────────────┘  └─────────────┘  └─────────────┘          │
│                                                              │
│  Total used: ~1.8GB of 8GB available                        │
│  Room for more Pods until resources exhausted               │
└─────────────────────────────────────────────────────────────┘

How scheduler decides placement

Available resources - Does the Node have enough CPU/memory?
Constraints - Does the Pod require specific Node type (GPU, etc.)?
Spreading - Avoid putting all copies on same Node (fault tolerance)

How ClusterIP Service Works

Frontend Pod calling API Pod:

┌─────────────────────────────────────────────────────────────────┐
│                        EKS Cluster                               │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Deployment: frontend (3 Pods)                            │    │
│  │  calls: http://api-service:8080/users                    │    │
│  └───────────────────────────┬──────────────────────────────┘    │
│                              │                                   │
│                              ▼                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Service: api-service (ClusterIP)                         │    │
│  │ IP: 10.100.50.25 (internal only)                         │    │
│  │ DNS: api-service.default.svc.cluster.local               │    │
│  └───────────────────────────┬──────────────────────────────┘    │
│                              │ load balances                     │
│                              ▼                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Deployment: api (5 Pods)                                 │    │
│  │  ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐     │    │
│  │  │Pod    │ │Pod    │ │Pod    │ │Pod    │ │Pod    │     │    │
│  │  │:8080  │ │:8080  │ │:8080  │ │:8080  │ │:8080  │     │    │
│  │  └───────┘ └───────┘ └───────┘ └───────┘ └───────┘     │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

What is ClusterIP?

Virtual IP. No actual network interface has this IP. It only exists in iptables rules.

Packet flow

1. Frontend Pod sends HTTP request to api-service:8080

2. DNS resolves: api-service → 10.100.50.25 (ClusterIP)

3. Packet sent: dst: 10.100.50.25:8080

4. Node's iptables intercepts packet

5. iptables rewrites destination:
   10.100.50.25:8080 → 10.0.2.47:8080 (actual Pod IP)

6. Packet routed to API Pod

kube-proxy

Component running on every Node that manages iptables rules.

┌─────────────────────────────────────────────────────────────┐
│                    Control Plane                             │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ API Server                                            │   │
│  │ Knows:                                                │   │
│  │ - Service "api-service" has ClusterIP 10.100.50.25   │   │
│  │ - Pods behind it: 10.0.2.47, 10.0.2.48, 10.0.3.22    │   │
│  └───────────────────────┬──────────────────────────────┘   │
└──────────────────────────┼──────────────────────────────────┘
                           │ watches for changes
                           ▼
┌──────────────────────────────────────────────────────────────┐
│  Node 1                  │  Node 2                           │
│  ┌─────────────────┐     │  ┌─────────────────┐              │
│  │ kube-proxy      │     │  │ kube-proxy      │              │
│  │ Updates local   │     │  │ Updates local   │              │
│  │ iptables rules  │     │  │ iptables rules  │              │
│  └─────────────────┘     │  └─────────────────┘              │
└──────────────────────────┴───────────────────────────────────┘

When Pods change

API Pod 10.0.2.48 crashes
         │
         ▼
Control Plane detects Pod gone
         │
         ▼
API Server updates Endpoints:
  api-service → [10.0.2.47, 10.0.3.22]  (removed .48)
         │
         ▼
kube-proxy on all Nodes sees change
         │
         ▼
iptables rules updated:
  10.100.50.25 → [10.0.2.47, 10.0.3.22]

New requests never go to dead Pod.

Why Use Containers?

Problems without containers

Problem 1: “Works on my machine”

Developer's laptop:        Production server:
- Python 3.11              - Python 3.8
- Library v2.1             - Library v1.9

App works locally → Crashes in production

Problem 2: Dependency conflicts

One EC2 running two apps:

App A needs: Python 3.8, OpenSSL 1.1
App B needs: Python 3.11, OpenSSL 3.0

They conflict. Need separate servers = more cost.

Problem 3: Slow deployment

Traditional deployment:
1. SSH into server
2. Stop old app
3. Pull new code
4. Install dependencies (5-10 minutes)
5. Start new app
6. Hope it works

If it fails → rollback is painful

Problem 4: Slow scaling

Traffic spike → need more servers

1. Launch new EC2 (2-3 minutes)
2. Install OS packages
3. Install app dependencies
4. Configure app
5. Start app
6. Register with load balancer

Total: 5-10 minutes. Users already left.

Solutions with containers

Solution 1: Package everything together

Container image includes:
- Your app code
- Exact Python version
- Exact library versions
- Exact OS libraries

Same image runs identically everywhere:
laptop = staging = production

Solution 2: Isolation

One EC2 running two containers:

┌─────────────────────────────────────────┐
│              EC2 Instance               │
│  ┌─────────────────┐ ┌─────────────────┐│
│  │ Container A     │ │ Container B     ││
│  │ Python 3.8      │ │ Python 3.11     ││
│  │ OpenSSL 1.1     │ │ OpenSSL 3.0     ││
│  │ (isolated)      │ │ (isolated)      ││
│  └─────────────────┘ └─────────────────┘│
└─────────────────────────────────────────┘

No conflicts. Both run on same server.

Solution 3: Fast deployment

Container deployment:
1. Pull new image (already built, seconds)
2. Start new container
3. Health check passes
4. Route traffic to new container
5. Stop old container

Rollback = start old image (seconds)

Solution 4: Fast scaling

Traffic spike → need more containers

1. Container image already exists
2. Start new container (seconds)
3. Health check passes
4. Route traffic

Total: 10-30 seconds

ECS vs EKS: When to Use

	ECS	EKS
System	AWS-proprietary	Industry-standard Kubernetes
Complexity	Simpler, fewer concepts	More complex, more features
Skills	AWS-only	Portable (any cloud/on-prem)
AWS integration	Tighter	Large ecosystem (Helm, Istio, etc.)
Configuration	Less	More configuration, more control

Use ECS when:

Simpler container workloads
Team doesn’t know Kubernetes
Want less operational overhead
Only using AWS

Use EKS when:

Team already knows Kubernetes
Need Kubernetes-specific features
Want portability across clouds
Need ecosystem tools (Helm, service mesh, etc.)

Node Management: ECS vs EKS

ECS Node Management

Two modes:

Mode	Who manages EC2?
EC2 Launch Type	You (via ASG)
Fargate	AWS (no EC2 to manage)

EC2 Launch Type: You create EC2 instances with ECS Agent pre-installed (ECS-optimized AMI). Agent auto-registers to cluster.

┌─────────────────────────────────────────────────────────────┐
│                    EC2 Instance                              │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ ECS Agent                                            │    │
│  │ - Registers with ECS Control Plane                   │    │
│  │ - Reports available CPU/memory                       │    │
│  │ - Receives "Run this Task" commands                  │    │
│  │ - Starts/stops containers                            │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

EKS Node Management

Three modes:

Mode	Who manages EC2?	Who joins to cluster?
Self-Managed Nodes	You (via ASG)	You (bootstrap script)
Managed Node Groups	AWS	AWS
Fargate	AWS	N/A

Self-Managed: You create EC2 with EKS-optimized AMI and bootstrap script. kubelet auto-registers to API Server.

┌─────────────────────────────────────────────────────────────┐
│                    EC2 Instance (Node)                       │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ kubelet                                              │    │
│  │ - Registers with Kubernetes API Server               │    │
│  │ - Reports Node status                                │    │
│  │ - Receives "Run this Pod" commands                   │    │
│  │ - Manages Pod lifecycle                              │    │
│  └─────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ kube-proxy                                           │    │
│  │ - Manages iptables rules for Service routing         │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Managed Node Groups: You specify instance type and count. AWS handles provisioning, joining, and rolling updates.

Using ASG with ECS/EKS

ASG manages EC2 instance count. Registration to cluster is automatic via User Data script.

ECS + ASG Setup

1. Create ECS Cluster (name only)

2. Create Launch Template:
   - AMI: ECS-optimized AMI
   - IAM Role: ecsInstanceRole
   - User Data:
     #!/bin/bash
     echo "ECS_CLUSTER=my-cluster" >> /etc/ecs/ecs.config

3. Create ASG with Launch Template
   - No Target Group needed
   - Instances auto-register to ECS cluster

EKS + ASG Setup (Self-Managed)

1. Create EKS Cluster

2. Create IAM Role for Nodes:
   - AmazonEKSWorkerNodePolicy
   - AmazonEC2ContainerRegistryReadOnly
   - AmazonEKS_CNI_Policy

3. Update aws-auth ConfigMap (allow IAM role)

4. Create Launch Template:
   - AMI: EKS-optimized AMI
   - IAM Role: (from step 2)
   - User Data:
     #!/bin/bash
     /etc/eks/bootstrap.sh my-eks-cluster

5. Create ASG with Launch Template
   - No Target Group needed
   - Instances auto-register as Nodes

Key difference: EKS requires aws-auth ConfigMap update for authorization.

Where is the Kubernetes API Server?

In EKS: AWS manages it in their own VPC. You don’t see the EC2 instances running it.

┌─────────────────────────────────────────────────────────────┐
│                    AWS-Managed VPC                           │
│                    (hidden from you)                         │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              EKS Control Plane                       │    │
│  │                                                      │    │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐    │    │
│  │  │ API Server │  │ API Server │  │ API Server │    │    │
│  │  │ (HA)       │  │ (HA)       │  │ (HA)       │    │    │
│  │  └────────────┘  └────────────┘  └────────────┘    │    │
│  │                                                      │    │
│  │  ┌────────────┐                                     │    │
│  │  │ etcd       │  (stores cluster state)             │    │
│  │  └────────────┘                                     │    │
│  │                                                      │    │
│  │  Endpoint: https://XXXXX.eks.amazonaws.com          │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ HTTPS
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Your VPC                                  │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │ Node         │  │ Node         │  │ Node         │       │
│  │ (kubelet     │  │ (kubelet     │  │ (kubelet     │       │
│  │  talks to    │  │  talks to    │  │  talks to    │       │
│  │  API Server) │  │  API Server) │  │  API Server) │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
│                                                              │
│  kubectl commands also go to API Server endpoint            │
└─────────────────────────────────────────────────────────────┘

Endpoint access options:

Public endpoint (accessible from internet)
Private endpoint (only from within VPC)
Both

ASG vs Target Group

These are independent concepts.

Concept	Purpose	Required for cluster?
ASG	Scale EC2 instance count	Optional (can use Managed Node Groups or Fargate)
Target Group	Route ALB/NLB traffic to targets	Only if you need load balancer

ASG: Manages Node count

ASG: "Keep 3-10 EC2 instances running"

Nothing to do with traffic routing.

Target Group: Routes traffic to Pods/Tasks

ALB → Target Group → Pod/Task IPs

Who registers Pods to Target Group?
- ECS: ECS Service (automatic)
- EKS: AWS Load Balancer Controller (you install)

In container workloads: Traffic goes to Pods/Tasks, not EC2 instances. So Target Group contains Pod IPs, not Node IPs.

What is ConfigMap?

Kubernetes object storing configuration as key-value pairs. Pods can read this data as environment variables or files.

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  DATABASE_HOST: "db.example.com"
  LOG_LEVEL: "info"

Purpose: Separate configuration from container image. Change config without rebuilding.

What is aws-auth ConfigMap?

Special ConfigMap that maps AWS IAM identities to Kubernetes permissions.

Kubernetes doesn’t understand IAM. aws-auth bridges them.

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::123456789:role/my-node-role
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes

Why Nodes need it

1. New EC2 Node starts, kubelet calls API Server: "Register me"

2. kubelet authenticates using EC2's IAM Role

3. API Server checks aws-auth ConfigMap:
   "Is this IAM Role allowed to be a Node?"

4. If IAM Role is in aws-auth → Node joins cluster
   If not → Rejected

Key groups

Group	Permission
system:nodes	Allows Node operations
system:masters	Full admin access

Fargate Pricing vs EC2

Fargate costs more per compute unit, but can be cheaper overall.

When Fargate is cheaper

Variable/bursty workloads
Low utilization
Short-running tasks
Don’t want to manage nodes

When EC2 is cheaper

Steady 24/7 workloads
High utilization (>70%)
Can use Reserved Instances or Spot

Example comparison (1 vCPU, 2GB)

	Fargate	EC2 m5.large (2 vCPU, 8GB)
Hourly	~$0.05	~$0.10
8 hours/day, 30 days	$12	$72 (24/7) or $24 (8hr/day)

Fargate wins at low utilization. EC2 wins at high utilization.

EKS Autoscaling Components

Overview

Component	What it does
Horizontal Pod Autoscaler (HPA)	Adds/removes pod replicas based on CPU, memory, or custom metrics
Vertical Pod Autoscaler (VPA)	Adjusts CPU/memory requests for existing pods
Cluster Autoscaler	Adds/removes EC2 nodes when pods can’t be scheduled
Karpenter	Alternative to Cluster Autoscaler - provisions optimal EC2 directly
AWS Load Balancer Controller	Creates ALB/NLB when you define Ingress or LoadBalancer Service

How They Work Together

Traffic increases
     │
     ▼
┌─────────────────────────────────────────────────────────────────┐
│ AWS Load Balancer Controller                                    │
│   Creates/manages ALB or NLB to route traffic to pods           │
└─────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────────────────┐
│ Horizontal Pod Autoscaler (HPA)                                 │
│   "CPU at 80%? Add more pod replicas"                           │
│   Scales: 3 pods → 10 pods                                      │
└─────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────────────────┐
│ Cluster Autoscaler / Karpenter                                  │
│   "10 pods need scheduling but nodes are full? Add EC2 nodes"   │
│   Scales: 2 nodes → 5 nodes                                     │
└─────────────────────────────────────────────────────────────────┘

Resource Requests and Limits

Every pod specifies how much CPU/memory it needs:

containers:
- name: my-app
  resources:
    requests:          # Guaranteed minimum - scheduler uses this
      cpu: 500m        # 500 millicores = 0.5 CPU
      memory: 256Mi
    limits:            # Maximum allowed - killed if exceeds memory
      cpu: 1000m
      memory: 512Mi

Term	What it means
Request	“I need at least this much” - used for scheduling
Limit	“Never give me more than this” - enforced at runtime
Actual usage	What the container is really using right now

Horizontal Pod Autoscaler (HPA)

Adds/removes pod replicas based on metrics.

How HPA Calculates

HPA uses average across all pods, calculated relative to requests (not limits):

Formula:
  desiredReplicas = currentReplicas × (currentMetricValue / targetMetricValue)

Example:
  Target: 70% CPU utilization
  Current replicas: 3
  Pod requests: 500m CPU each
  
  Pod 1 actual: 400m (80% of request)
  Pod 2 actual: 450m (90% of request)
  Pod 3 actual: 350m (70% of request)
  
  Average: (80 + 90 + 70) / 3 = 80%
  
  desiredReplicas = 3 × (80% / 70%) = 3.42 → 4 pods

Key: Utilization % = actual usage / request (NOT actual / limit)

HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

HPA Scaling Flow

┌─────────────────────────────────────────────────────────────────┐
│ 1. Metrics Server collects data (every 15s)                     │
│    Pod 1: 80%, Pod 2: 90%, Pod 3: 70%                           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│ 2. HPA checks metrics (every 15s)                               │
│    Average: 80% > Target: 70% → scale UP                        │
│    Desired = 3 × (80/70) = 4 pods                               │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│ 3. HPA updates Deployment replicas: 3 → 4                       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│ 4. Scheduler places new pod                                     │
│    If no node has capacity → Pod stays "Pending"                │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼ (if Pending)
┌─────────────────────────────────────────────────────────────────┐
│ 5. Cluster Autoscaler adds new EC2 node                         │
│    Pod scheduled on new node                                    │
└─────────────────────────────────────────────────────────────────┘

Vertical Pod Autoscaler (VPA)

Adjusts CPU/memory requests for existing pods. Makes pods bigger, not more numerous.

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"    # Auto-restart pods with new values
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi

Update Modes

Mode	Behavior
Off	Recommend only, no changes
Initial	Apply to new pods only
Auto	Restart pods with new values

HPA vs VPA

VPA solves: "My pod always uses more memory than requested"
  Before: request=256Mi, actual=800Mi → OOMKilled
  After:  request=1Gi, actual=800Mi → stable

HPA solves: "Traffic increased, need more parallel processing"
  Before: 3 pods, 1000 req/s → each pod overloaded
  After:  30 pods, 1000 req/s → load distributed

Don’t use both on same metric - they conflict:

HPA: "CPU at 80%, add pods" → CPU drops to 40%
VPA: "CPU at 40%, lower request" → CPU jumps to 80%
→ Infinite loop

Safe combination: HPA on custom metrics (requests/sec), VPA on CPU/memory.

Cluster Autoscaler

Adds/removes EC2 nodes based on pending pods and node utilization.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│ Scale UP trigger:                                               │
│   Pod is Pending because no node has enough resources           │
│   → Increase ASG desired capacity                               │
│   → New EC2 launches, joins cluster                             │
│   → Pod scheduled on new node                                   │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ Scale DOWN trigger:                                             │
│   Node underutilized for 10+ minutes                            │
│   All pods can be moved to other nodes                          │
│   → Drain node (evict pods)                                     │
│   → Decrease ASG desired capacity                               │
│   → EC2 terminated                                              │
└─────────────────────────────────────────────────────────────────┘

Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
spec:
  template:
    spec:
      containers:
      - name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m

Karpenter (Alternative to Cluster Autoscaler)

Provisions EC2 instances directly (not via ASG). Picks optimal instance type per workload.

Karpenter vs Cluster Autoscaler

Aspect	Cluster Autoscaler	Karpenter
How it scales	Adjusts ASG size	Provisions EC2 directly
Node types	Fixed per node group	Chooses best instance per pod
Speed	Slower (ASG → EC2)	Faster (direct EC2 API)
Bin packing	Basic	Smart (fits pods efficiently)
Spot handling	Manual setup	Built-in with fallback
Cost	Often overprovisions	Better right-sizing

Example

Pod needs 3 CPU, 2Gi memory

Cluster Autoscaler (node group = m5.xlarge only):
  → Launches m5.xlarge (4 CPU, 16Gi)
  → Wasted: 1 CPU, 14Gi
  → Cost: $0.192/hr

Karpenter (can choose from multiple types):
  → Picks optimal instance or bins multiple pods
  → Minimal waste
  → Or picks Spot: ~$0.06/hr

Karpenter Configuration

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m5.large", "m5.xlarge", "c5.large", "c5.xlarge"]
  limits:
    cpu: 100
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

Consolidation

After traffic drops:

Cluster Autoscaler:
  Node 1: 20% utilized
  Node 2: 30% utilized
  Node 3: 25% utilized
  → Keeps all 3 (none below threshold)

Karpenter:
  → "I can fit all pods on 1 larger node"
  → Consolidates to 1 node
  → Terminates 2 nodes

Typical savings: 20-50% EC2 cost reduction vs Cluster Autoscaler.

AWS Load Balancer Controller

Creates and manages ALB/NLB when you define Ingress or LoadBalancer Service.

Ingress → ALB

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-app-service
            port:
              number: 80

Service type LoadBalancer → NLB

apiVersion: v1
kind: Service
metadata:
  name: my-app-nlb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: external
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  type: LoadBalancer
  selector:
    app: my-app
  ports:
  - port: 443
    targetPort: 8080

Target Types

Type	How traffic routes
`ip`	Direct to Pod IP (recommended)
`instance`	To Node, then kube-proxy routes to Pod

Why Requests Matter for Scheduling

Scheduler uses requests, not actual usage:

Node capacity: 4 CPU, 8Gi memory

Pod A requests: 1 CPU  ─┐
Pod B requests: 1 CPU   ├─ Total requested: 3 CPU
Pod C requests: 1 CPU  ─┘

Remaining for scheduling: 1 CPU

New pod wants: 2 CPU
→ CANNOT schedule (only 1 CPU available by request)
→ Even if actual usage is low!

Actual usage might be:
  Pod A: 0.3 CPU (30% of request)
  Pod B: 0.5 CPU (50% of request)
  Pod C: 0.2 CPU (20% of request)
  Total actual: 1 CPU

But scheduler only looks at REQUESTS.
This is why VPA is useful - right-sizes requests.

Autoscaling Summary

Component	Looks at	Compares to	Action
HPA	Average actual usage	Requests (as %)	Add/remove pods
VPA	Individual pod usage over time	Current requests	Adjust request values
Cluster Autoscaler	Pending pods	Node capacity (by requests)	Add/remove nodes
Karpenter	Pending pods	Available instance types	Provision optimal nodes
Scheduler	Pod requests	Node unrequested capacity	Place pods on nodes

Monitoring: CloudWatch vs AMP (Prometheus)

Two Approaches

CloudWatch method:
┌─────────────────────────────────────────────────────────────────┐
│ EKS Node                                                        │
│                                                                 │
│   ┌─────────────────┐                                           │
│   │ CloudWatch Agent│───────► CloudWatch                        │
│   │ (system metrics)│                                           │
│   └─────────────────┘                                           │
│                                                                 │
│   ┌─────────────────┐                                           │
│   │ App             │───────► CloudWatch                        │
│   │ (SDK push)      │         (two senders)                     │
│   └─────────────────┘                                           │
└─────────────────────────────────────────────────────────────────┘

Prometheus method:
┌─────────────────────────────────────────────────────────────────┐
│ EKS Node                                                        │
│                                                                 │
│   ┌─────────────────┐      scrape       ┌─────────────────┐     │
│   │ Prometheus Agent│◄──────────────────│ App :9090       │     │
│   │                 │◄──────────────────│ node-exporter   │     │
│   │                 │◄──────────────────│ kube-state-metrics    │
│   └────────┬────────┘                   └─────────────────┘     │
│            │                                                    │
│            │ one sender                                         │
│            ▼                                                    │
│   ┌──────────────┐                                              │
│   │ AMP          │                                              │
│   └──────────────┘                                              │
└─────────────────────────────────────────────────────────────────┘

Key Difference

	CloudWatch	Prometheus/AMP
Who sends to backend	Agent + App (both)	Agent only
App’s job	Call CloudWatch API	Expose HTTP `/metrics` endpoint
Network calls from app	Yes (to CloudWatch)	No (agent scrapes locally)

Amazon Managed Service for Prometheus (AMP)

Fully managed Prometheus-compatible monitoring. You send metrics, AWS handles storage/scaling.

What is Prometheus?

Open-source monitoring system. Industry standard for Kubernetes monitoring.

System Metrics Collection

Prometheus agent scrapes exporters that expose /metrics:

┌─────────────────────────────────────────────────────────────────┐
│ EKS Node                                                        │
│                                                                 │
│   ┌─────────────────┐                                           │
│   │ node-exporter   │ ← CPU, memory, disk, network              │
│   │ :9100/metrics   │   (DaemonSet)                             │
│   └────────┬────────┘                                           │
│            │ scrape                                             │
│   ┌────────┴────────┐                                           │
│   │ Prometheus Agent│───────► AMP Workspace                     │
│   └────────┬────────┘                                           │
│            │ scrape                                             │
│   ┌────────┴────────┐                                           │
│   │ kube-state-     │ ← K8s object states (pods, deployments)   │
│   │ metrics         │                                           │
│   └─────────────────┘                                           │
└─────────────────────────────────────────────────────────────────┘

CloudWatch vs AMP

Aspect	CloudWatch	AMP
Best for	AWS services	Kubernetes/containers
Query language	CloudWatch Insights	PromQL (industry standard)
High-cardinality	Expensive at scale	Designed for it
Ecosystem	AWS-native	1000s of Prometheus exporters
Portability	AWS only	Same queries work anywhere

When to Use AMP

Running EKS and want Prometheus-compatible monitoring
Need PromQL queries
High-cardinality metrics (e.g., per-customer metrics)
Want to reuse existing Prometheus dashboards
Multi-cloud/hybrid (same queries everywhere)

AWS X-Ray (Distributed Tracing)

What is a Trace?

Tracks a single request as it flows through multiple services.

User clicks "Buy"
         │
         ▼
┌─────────────────────────────────────────────────────────────────┐
│ Trace ID: abc-123                                               │
│                                                                 │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Span 1: API Gateway (50ms)                                  │ │
│ │ ├── Span 2: Order Service (200ms)                           │ │
│ │ │   ├── Span 3: Inventory Service (80ms)                    │ │
│ │ │   ├── Span 4: Payment Service (100ms)  ← bottleneck       │ │
│ │ │   └── Span 5: DynamoDB (20ms)                             │ │
│ └─────────────────────────────────────────────────────────────┘ │
│                                                                 │
│ Total: 250ms                                                    │
└─────────────────────────────────────────────────────────────────┘

Span = One unit of work (service call, DB query)
Trace = Collection of spans for one request

Metrics vs Traces

	Metrics	Traces
What	Aggregated numbers	Individual request paths
Question	“How many errors per minute?”	“Why was THIS request slow?”
Granularity	Summary (avg, p99)	Per-request detail

Metrics: "5% of requests are slow"
Traces:  "This slow request spent 2s waiting for DB"

X-Ray Service Map

Auto-generated visual of your architecture:

        ┌─────────┐
        │ API GW  │
        │ 99% OK  │
        └────┬────┘
             │
             ▼
        ┌─────────┐
        │ Lambda  │
        │ 95% OK  │ ← 5% errors visible
        └────┬────┘
             │
        ┌────┴────┐
        ▼         ▼
   ┌─────────┐ ┌─────────┐
   │ DynamoDB│ │ S3      │
   │ 15ms    │ │ 50ms    │
   └─────────┘ └─────────┘

When to Use X-Ray

Debug slow requests
Find errors in distributed systems
Understand service dependencies
Identify bottlenecks

ADOT (AWS Distro for OpenTelemetry)

AWS’s distribution of OpenTelemetry collector. Can replace Prometheus agent.

What it Does

┌─────────────────────────────────────────────────────────────────┐
│ ADOT Collector                                                  │
│                                                                 │
│   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐       │
│   │ Receivers   │────►│ Processors  │────►│ Exporters   │       │
│   │             │     │             │     │             │       │
│   │ - Prometheus│     │ - Filter    │     │ - AMP       │       │
│   │ - OTLP      │     │ - Transform │     │ - CloudWatch│       │
│   │ - StatsD    │     │             │     │ - X-Ray     │       │
│   └─────────────┘     └─────────────┘     └─────────────┘       │
└─────────────────────────────────────────────────────────────────┘

ADOT vs Prometheus Agent

Aspect	Prometheus Agent	ADOT
Purpose	Metrics only	Metrics + Traces + Logs
Input	Prometheus only	Many formats
Output	AMP only	AMP, CloudWatch, X-Ray, etc.
Use case	Simple Prometheus setup	Multi-destination, traces

When to Use ADOT

Need metrics AND traces (AMP + X-Ray)
Want to send same metrics to multiple destinations
Want vendor-neutral OpenTelemetry standard

Observability Summary

Tool	Data Type	Use For
CloudWatch	Metrics, Logs	AWS-native monitoring, simple setup
AMP	Metrics	Prometheus ecosystem, PromQL, K8s-native
X-Ray	Traces	Debugging requests, finding bottlenecks
ADOT	All	Unified collection, multi-destination

Common EKS setup: AMP for metrics + X-Ray for traces, collected via ADOT.

What is Kubernetes?#

Problems it solves#

Key Terms#

What is EKS (Elastic Kubernetes Service)?#

Problem it solves#

ECS vs EKS Terminology Mapping#

Task (ECS) ≈ Pod (EKS)#

Service (ECS) ≠ Node (EKS)#

ECS Service vs Kubernetes Deployment + Service#

ECS: One concept does two things#

Kubernetes: Two separate concepts#

Why does Kubernetes split them?#

Kubernetes Service Types#

Multiple Pods/Tasks per Node/EC2#

How scheduler decides placement#

How ClusterIP Service Works#

What is ClusterIP?#

Packet flow#

kube-proxy#

When Pods change#

Why Use Containers?#

Problems without containers#

Solutions with containers#

ECS vs EKS: When to Use#

Node Management: ECS vs EKS#

ECS Node Management#

EKS Node Management#

Using ASG with ECS/EKS#

ECS + ASG Setup#

EKS + ASG Setup (Self-Managed)#

Where is the Kubernetes API Server?#

ASG vs Target Group#

ASG: Manages Node count#

Target Group: Routes traffic to Pods/Tasks#

What is ConfigMap?#

What is aws-auth ConfigMap?#

Why Nodes need it#

Key groups#

Fargate Pricing vs EC2#

When Fargate is cheaper#

When EC2 is cheaper#

Example comparison (1 vCPU, 2GB)#

EKS Autoscaling Components#

Overview#

How They Work Together#

Resource Requests and Limits#

Horizontal Pod Autoscaler (HPA)#

How HPA Calculates#

HPA Configuration#

HPA Scaling Flow#

Vertical Pod Autoscaler (VPA)#

VPA Configuration#

Update Modes#

HPA vs VPA#

Cluster Autoscaler#

How It Works#

Configuration#

Karpenter (Alternative to Cluster Autoscaler)#

Karpenter vs Cluster Autoscaler#

Example#

Karpenter Configuration#

Consolidation#

AWS Load Balancer Controller#

Ingress → ALB#

Service type LoadBalancer → NLB#

Target Types#

Why Requests Matter for Scheduling#

Autoscaling Summary#

Monitoring: CloudWatch vs AMP (Prometheus)#

Two Approaches#

Key Difference#

Amazon Managed Service for Prometheus (AMP)#

What is Prometheus?#

System Metrics Collection#

CloudWatch vs AMP#

When to Use AMP#

AWS X-Ray (Distributed Tracing)#

What is a Trace?#

Metrics vs Traces#

X-Ray Service Map#