What is AWS Systems Manager?

A collection of tools to manage and operate your infrastructure (EC2, on-premises servers, containers) at scale. Core concept: SSM Agent runs on your instances and communicates with SSM service—no inbound ports needed.

Key Capabilities

CategoryCapabilityWhat it does
Node ManagementFleet ManagerView and manage all servers from one console
Session ManagerSSH/RDP without opening ports or managing keys
Run CommandExecute scripts on multiple instances at once
Patch ManagerAutomate OS and application patching
State ManagerKeep instances in a defined configuration state
OperationsOpsCenterCentral place to view and resolve operational issues
Incident ManagerManage and respond to incidents
ExplorerDashboard showing operational data across accounts
Application ManagementParameter StoreStore config values and secrets (free tier available)
AppConfigDeploy application configuration safely with rollback
Change ManagementAutomationRun multi-step runbooks
Change ManagerApprove and track operational changes
Maintenance WindowsSchedule operations during defined time windows

Run Command

Execute commands on multiple instances without SSH.

Configuration

SettingWhat to specifyDetails
DocumentWhich command/script to runAWS-provided (e.g., AWS-RunShellScript) or custom
TargetsWhich instancesBy instance ID, tag, resource group
ParametersInput to the documentCommands to run, timeout, working directory
Rate ControlHow many at onceConcurrency (10 or 50%) and error threshold
OutputWhere to store resultsS3 bucket, CloudWatch Logs

Common Documents

  • AWS-RunShellScript - Run bash on Linux
  • AWS-RunPowerShellScript - Run PowerShell on Windows
  • AWS-InstallApplication - Install software

Execution Flow

You send Run Command
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ SSM Service receives request                        │
│ Creates Command ID: cmd-0a1b2c3d4e5f                │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ SSM Agent Polling (on each instance)                │
│                                                     │
│ Agent polls SSM service every 5 sec via HTTPS (443) │
│ "Any commands for me?"                              │
│                                                     │
│ No inbound ports needed - agent initiates outbound  │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Agent Executes Commands Locally                     │
│                                                     │
│ $ yum update -y                                     │
│ $ systemctl restart nginx                           │
│                                                     │
│ Captures: stdout, stderr, exit code                 │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Agent Reports Results Back                          │
│                                                     │
│ i-abc123: SUCCESS (exit code 0)                     │
│ i-def456: FAILED (exit code 1)                      │
│                                                     │
│ Output stored in SSM console, S3, CloudWatch Logs   │
└─────────────────────────────────────────────────────┘

Patch Manager

Automate patching of OS and applications.

Configuration

SettingWhat to specifyDetails
Patch BaselineWhich patches to applyRules for auto-approval (severity, classification, days after release)
Patch GroupWhich instancesTag-based grouping (Patch Group = production)
Maintenance WindowWhen to patchSchedule (cron), duration, cutoff time
OperationScan or InstallScan = report only, Install = apply patches
Reboot OptionAfter patchingRebootIfNeeded, NoReboot, AlwaysReboot

Patch Baseline Rules Example

Product: AmazonLinux2
Classification: Security, Bugfix
Severity: Critical, Important
Auto-approve: 7 days after release

Scan vs Install

  • Scan: Only checks what’s missing, reports compliance status, no changes made
  • Install: Checks + installs patches, reports what was installed, may reboot instance

Execution Flow

Maintenance Window triggers (e.g., Sunday 2:00 AM)
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Identify Target Instances                           │
│ Patch Group tag: "Patch Group" = "production"       │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Get Applicable Patches from Baseline                │
│                                                     │
│ Query patch repository:                             │
│   - kernel-5.10.102 (Security, Critical)            │
│   - openssl-1.1.1k (Security, Important)            │
│   - curl-7.79.1 (Bugfix)                            │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Run AWS-RunPatchBaseline on Each Instance           │
│                                                     │
│ 1. Scan installed packages                          │
│ 2. Compare with baseline rules                      │
│ 3. Download missing patches                         │
│ 4. Install patches                                  │
│ 5. Reboot if required                               │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Report Compliance                                   │
│                                                     │
│ i-abc123: COMPLIANT (3 patches installed)           │
│ i-def456: NON_COMPLIANT (1 patch failed)            │
└─────────────────────────────────────────────────────┘

State Manager

Keep instances in a desired configuration state continuously.

Configuration

SettingWhat to specifyDetails
AssociationDocument + targets + scheduleThe core unit of State Manager
DocumentWhat configuration to enforceAWS-GatherSoftwareInventory, AWS-ConfigureAWSPackage, custom
TargetsWhich instancesBy tag, instance ID, all instances
ScheduleHow often to applyRate (every 30 min) or cron expression
Compliance SeverityHow to report driftCritical, High, Medium, Low

Use Cases

  • Ensure antivirus is always installed
  • Keep CloudWatch agent configured
  • Collect inventory on schedule
  • Join instances to Active Directory

Execution Flow

Association created:
  Document: AWS-ConfigureAWSPackage
  Targets: tag:Role = WebServer
  Schedule: rate(30 minutes)
  Parameters: action=Install, name=AmazonCloudWatchAgent
     │
     ▼ (every 30 minutes, or when new instance matches)
┌─────────────────────────────────────────────────────┐
│ Evaluate Targets                                    │
│                                                     │
│ Found: i-abc123, i-def456                           │
│ New instance: i-ghi789 → Automatically included!    │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Execute on Each Instance                            │
│                                                     │
│ i-abc123 (agent already installed):                 │
│   Check → Already compliant → No action             │
│                                                     │
│ i-ghi789 (new, agent not installed):                │
│   Check → Not installed → Install → Now compliant   │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Track Compliance                                    │
│                                                     │
│ If someone uninstalls agent manually:               │
│   → Next run (within 30 min) reinstalls it          │
│   → "Desired state" continuously enforced           │
└─────────────────────────────────────────────────────┘

Automation

Run multi-step workflows (runbooks) for complex operations.

Configuration

SettingWhat to specifyDetails
RunbookWhich automation documentAWS-provided or custom (YAML/JSON)
Execution ModeHow to runSimple, Rate control, Multi-account
ParametersInput valuesInstance IDs, AMI IDs, custom inputs
IAM RolePermissionsAssumeRole for cross-account or elevated permissions
TargetsWhich resourcesParameter values, tags, resource groups

Execution Modes

ModeBehavior
SimpleRun once with specified parameters
Rate ControlRun on multiple targets with concurrency/error limits
Multi-account/RegionRun across multiple AWS accounts or regions

Runbook Step Types

  • aws:executeAwsApi - Call any AWS API
  • aws:runCommand - Run command on instances
  • aws:executeScript - Run Python/PowerShell script
  • aws:approve - Wait for manual approval
  • aws:branch - Conditional logic
  • aws:sleep - Wait for duration
  • aws:changeInstanceState - Start/stop/terminate

Common AWS Runbooks

  • AWS-StopEC2Instance - Stop instance
  • AWS-CreateImage - Create AMI
  • AWS-PatchInstanceWithRollback - Patch with automatic rollback on failure
  • AWS-RestartEC2Instance - Stop → wait → start

Execution Flow with Approval

Trigger automation
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Step 1: aws:approve                                 │
│                                                     │
│ Status: WAITING                                     │
│ Sends SNS notification to approvers                 │
│ Execution PAUSES here                               │
└─────────────────────────────────────────────────────┘
     │
     ▼ (Approver clicks "Approve")
┌─────────────────────────────────────────────────────┐
│ Step 2: aws:changeInstanceState (stop)              │
│                                                     │
│ Calls ec2:StopInstances                             │
│ Status: SUCCESS                                     │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Step 3: aws:waitForAwsResourceProperty              │
│                                                     │
│ Polls ec2:DescribeInstances                         │
│ Waiting for State = "stopped"                       │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│ Step 4: aws:changeInstanceState (start)             │
│                                                     │
│ Calls ec2:StartInstances                            │
│ Status: SUCCESS                                     │
└─────────────────────────────────────────────────────┘

Error Handling

mainSteps:
  - name: stopInstance
    action: aws:changeInstanceState
    onFailure: step:rollback      # Go to rollback if fails
    inputs:
      DesiredState: stopped
      
  - name: doMaintenance
    action: aws:runCommand
    onFailure: step:rollback
    
  - name: startInstance
    action: aws:changeInstanceState
    onFailure: Abort              # Stop execution entirely
    isEnd: true
      
  - name: rollback
    action: aws:changeInstanceState
    inputs:
      DesiredState: running       # Restore instance if earlier step failed

Rate Control Execution

Targets: 50 instances tagged Environment=Production
Concurrency: 10
Error Threshold: 5

Batch 1: i-001 to i-010 (parallel)
  → 1 failed, error count: 1

Batch 2: i-011 to i-020 (parallel)
  → 2 failed, error count: 3

Batch 3: i-021 to i-030 (parallel)
  → 2 failed, error count: 5 (threshold!)
  → STOP EXECUTION
  → Remaining 28 instances NOT processed

Inventory

Stores and queries metadata about your instances (not the instances themselves).

What AWS-GatherSoftwareInventory Collects

  • Installed applications (name, version, publisher)
  • AWS components (SSM Agent version, etc.)
  • Network configuration (IP, MAC)
  • Windows updates, services, roles
  • Custom files you specify

Use Case

“Find all instances running Python 3.8” or “Which servers have outdated nginx?”


Comparison Summary

AspectRun CommandPatch ManagerState ManagerAutomation
PurposeOne-time commandPatch OS/appsMaintain desired stateMulti-step workflows
TriggerManual, EventBridgeMaintenance WindowSchedule (continuous)Manual, EventBridge
FrequencyOncePeriodicContinuousOnce or scheduled
IdempotentNoYesYesDepends on steps
Use caseAd-hoc tasksSecurity complianceDrift preventionComplex operations

Notes

  • SSM Agent: Must be installed and running on instances. Pre-installed on Amazon Linux, Windows AMIs
  • IAM permissions: Instance needs IAM role with AmazonSSMManagedInstanceCore policy
  • No inbound ports: Agent initiates outbound HTTPS to SSM endpoints
  • Hybrid: Can manage on-premises servers with SSM Agent + activation