Skip to content
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@
"microvms",
"firecracker"
],
"version": "1.2.0"
"version": "1.3.0"
},
{
"category": "migration",
Expand Down
2 changes: 1 addition & 1 deletion plugins/aws-serverless/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,5 @@
"license": "Apache-2.0",
"name": "aws-serverless",
"repository": "https://github.com/awslabs/agent-plugins",
"version": "1.2.0"
"version": "1.3.0"
}
2 changes: 1 addition & 1 deletion plugins/aws-serverless/.codex-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "aws-serverless",
"version": "1.2.0",
"version": "1.3.0",
"description": "Design, build, deploy, test, and debug serverless applications with AWS Serverless services.",
"author": {
"name": "Amazon Web Services",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@ description: >
Evaluate, configure, and migrate workloads to AWS Lambda Managed Instances (LMI).
Triggers on: Lambda Managed Instances, LMI, capacity provider, multi-concurrency Lambda,
dedicated instance Lambda, EC2-backed Lambda, cold start elimination, Graviton Lambda,
instance type for Lambda, Lambda cost optimization with Reserved Instances or Savings Plans.
Also trigger when users describe high-volume predictable workloads seeking cost savings,
instance type for Lambda, scheduled scaling for LMI, Lambda cost optimization with
Reserved Instances or Savings Plans. Also trigger when users describe high-volume
predictable workloads seeking cost savings, want to scale LMI capacity on a schedule,
or compare Lambda vs EC2 for steady-state traffic. For standard Lambda without LMI,
use the aws-lambda skill instead.
argument-hint: "[describe your workload or what you need help with]"
metadata:
tags: lambda, lmi, managed-instances, ec2, capacity-provider, multi-concurrency, cost-optimization
tags: lambda, lmi, managed-instances, ec2, capacity-provider, multi-concurrency, cost-optimization, scheduled-scaling
---

# AWS Lambda Managed Instances (LMI)
Expand All @@ -22,11 +23,11 @@ For standard Lambda development, see [aws-lambda skill](../aws-lambda/). For SAM
## When to Load Reference Files

- **Cost comparison**, **pricing analysis**, **Lambda vs LMI cost**, **Savings Plans**, or **Reserved Instances** -> see [references/cost-comparison.md](references/cost-comparison.md)
- **Instance types**, **memory sizing**, **vCPU ratios**, **scaling tuning**, or **capacity provider config** -> see [references/configuration-guide.md](references/configuration-guide.md)
- **Instance types**, **memory sizing**, **vCPU ratios**, **scaling tuning**, **scheduled scaling**, or **capacity provider config** -> see [references/configuration-guide.md](references/configuration-guide.md)
- **Thread safety**, **concurrency model**, **code review checklist**, **Powertools compatibility**, or **multi-concurrency readiness** -> see [references/thread-safety.md](references/thread-safety.md)
- **Before/after code examples**, **runtime-specific migration** (Node.js, Python, Java, .NET), or **connection pooling** -> see [references/migration-patterns.md](references/migration-patterns.md)
- **IAM roles**, **VPC setup**, **CLI commands**, **SAM template**, or **CDK example** -> see [references/infrastructure-setup.md](references/infrastructure-setup.md) and [scripts/setup-lmi.sh](scripts/setup-lmi.sh)
- **Errors**, **throttling**, **debugging**, or **stuck deployments** -> see [references/troubleshooting.md](references/troubleshooting.md)
- **IAM roles**, **VPC setup**, **CLI commands**, **SAM template**, **CDK example**, or **scheduled scaling setup (EventBridge Scheduler)** -> see [references/infrastructure-setup.md](references/infrastructure-setup.md) and [scripts/setup-lmi.sh](scripts/setup-lmi.sh)
- **Errors**, **throttling**, **debugging**, **stuck deployments**, **tuning configuration**, or **adjusting after deployment** -> see [references/troubleshooting.md](references/troubleshooting.md)

## Quick Decision: Is LMI Right for This Workload?

Expand Down Expand Up @@ -54,6 +55,38 @@ Gather these signals before recommending:
6. **Concurrency readiness**: Thread safety (Node.js/Java/.NET)? Shared `/tmp` paths? Per-invocation DB connections?
7. **VPC**: Already in a VPC? Private resource access needed?

#### Deriving LMI Configuration from Metrics

If Lambda Insights is enabled on the function, use these metrics to calculate your starting configuration. If Lambda Insights is not enabled, suggest adding it to gather accurate workload data — but only proceed with the user's explicit confirmation, as adding the Insights layer may affect function performance or cold start times.

To check if Lambda Insights is enabled, look for a LambdaInsightsExtension layer on the function. To add it, find the latest layer ARN for your region from the [Lambda Insights documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights-extension-versions.html) and attach the `CloudWatchLambdaInsightsExecutionRolePolicy` managed policy to the function's execution role.

**Target max concurrency** (from `cpu_total_time` and `Duration`):

```
PerExecutionEnvironmentMaxConcurrency = floor((0.5 × Duration) / cpu_total_time)
```

This targets 50% CPU utilization at full concurrency, leaving headroom for scaling.

**Memory allocation** (from `memory_utilization` and current memory):

```
MemorySize = min(32768, max(2048, MaxConcurrency × (memory_utilization / 100) × current_allocated_memory))
```

This overestimates (assumes no shared base memory) but provides a safe starting point. The outer `min` caps the result at the 32 GB (32768 MB) LMI maximum.

**Minimum execution environments** (from baseline `ConcurrentExecutions`):

```
MinExecutionEnvironments = max(3, ceil(baseline_concurrent_executions × 2 / MaxConcurrency))
```

Targets 50% concurrency utilization to leave headroom for traffic bursts.

**Without Lambda Insights:** Start with the runtime's default max concurrency, 2 GB memory, and MinExecutionEnvironments = 3. Adjust during testing.

### Step 2: Build the Cost Comparison

REQUIRED: Present a cost comparison before recommending LMI. Compare at minimum:
Expand All @@ -71,12 +104,14 @@ For discount analysis (Savings Plans, Reserved Instances), refer users to the [A

**Instance families** (~450 types): C-series (compute, .xlarge+), M-series (general, .large+), R-series (memory, .large+). ARM (Graviton) for best price-performance.

**Memory-to-vCPU ratios**: 2:1 (compute), 4:1 (general, default), 8:1 (memory). Min 2 GB, max 32 GB.
**Memory-to-vCPU ratios**: 2:1 (default, CPU-bound work), 4:1 (general/mixed workloads), 8:1 (memory-heavy or Python apps). Min 2 GB, max 32 GB.

**Multi-concurrency defaults/vCPU**: Node.js 64, Java 32, .NET 32, Python 16.

**Scaling**: MinExecutionEnvironments (default 3), MaxVCpuCount (default 400), TargetResourceUtilization.

**Scheduled scaling**: For predictable traffic (business hours, marketing events), use EventBridge Scheduler to adjust Min/Max execution environments on a one-time or recurring schedule — scale up before peak, scale down or to zero when idle.

See [references/configuration-guide.md](references/configuration-guide.md) for decision trees and detailed tuning.

### Step 4: Migrate the Code
Expand Down Expand Up @@ -105,16 +140,16 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for
### Step 6: Validate and Cut Over

1. Deploy to a non-production environment first
2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate
3. Gradual traffic shift with weighted aliases (10% → 50% → 100%)
2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate. If you observe low CPU utilization or ongoing throttles, see [references/troubleshooting.md](references/troubleshooting.md) for metric-specific adjustment guidance.
3. Shift traffic to the LMI function (note: weighted alias shifting between LMI and non-LMI functions is not currently supported)
4. Compare costs after 1-2 weeks of production data
5. Decommission standard Lambda once stable

## Best Practices

### Configuration

- Do: Start with 4:1 ratio and runtime default concurrency
- Do: Start with 2:1 ratio and runtime default concurrency
- Do: Use ARM (Graviton) unless x86 dependencies exist
- Do: Let Lambda choose instance types unless specific hardware needed
- Do: Set MaxVCpuCount to control cost ceiling
Expand All @@ -125,7 +160,7 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for

- Do: Start with I/O-heavy functions (benefit most from multi-concurrency; CPU-bound functions compete for same CPU)
- Do: Review code for concurrency safety before attaching to capacity provider (thread safety for Node.js/Java/.NET; `/tmp` and memory for Python)
- Do: Use weighted aliases for gradual traffic shift
- Do: Plan traffic shifting strategy based on your invocation source (weighted alias shifting between LMI and non-LMI functions is not currently supported)
- Do: Include request IDs in all log statements
- Do: Initialize DB pools and SDK clients outside the handler
- Do: Estimate total `/tmp` usage under max concurrency
Expand All @@ -135,8 +170,10 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for
### Operations

- Do: Set CloudWatch alarms on throttle rate > 1% and CPU > 80%
- Do: Use scheduled scaling (EventBridge Scheduler) for predictable traffic — raise Min/Max before peak periods and lower them (or scale to zero) when idle
- Don't: Manually terminate LMI EC2 instances (delete the capacity provider instead)
- Don't: Forget to publish a version — unpublished functions cannot run on LMI
- Don't: Rely on a deactivated (Min=Max=0) function to self-recover — schedule an explicit scale-up to reactivate it

## Limits Quick Reference

Expand Down Expand Up @@ -172,7 +209,7 @@ REQUIRED: AWS credentials configured on the host machine.

### Regional Availability

Currently available: us-east-1, us-east-2, us-west-2, ap-northeast-1, eu-west-1. Expanding to all commercial regions soon.
Available in all commercial AWS Regions except Israel (Tel Aviv), Middle East (Bahrain), Middle East (UAE), and Asia Pacific (Auckland).

Check the [Lambda Managed Instances documentation](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html) for the latest regional availability.

Expand Down Expand Up @@ -204,12 +241,14 @@ Override: "use SAM" → SAM YAML, "use CloudFormation" → CloudFormation YAML.

### Unsupported Region

- State: "Lambda Managed Instances is not yet available in [region]"
- List available regions
- State: "Lambda Managed Instances is not available in [region]"
- Name the excluded regions: Israel (Tel Aviv), Middle East (Bahrain), Middle East (UAE), Asia Pacific (Auckland)
- Suggest the nearest supported region

## Resources

- [Lambda Managed Instances Docs](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html)
- [Scaling LMI & Scheduled Scaling Docs](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-scaling.html)
- [Introducing LMI (AWS Blog)](https://aws.amazon.com/blogs/aws/introducing-aws-lambda-managed-instances-serverless-simplicity-with-ec2-flexibility/)
- [Build High-Performance Apps with LMI](https://aws.amazon.com/blogs/compute/build-high-performance-apps-with-aws-lambda-managed-instances/)
- [Migrating Functions to LMI (AWS Blog)](https://aws.amazon.com/blogs/compute/migrating-your-functions-to-aws-lambda-managed-instances/)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@
- **CPU-intensive** (encoding, ML, compression) → C-series, 2:1 ratio, concurrency=1/vCPU
- **Memory-intensive** (caching, large datasets) → R-series, 8:1 ratio
- **Network-intensive** (streaming, data transfer) → Use AllowedInstanceTypes for n-suffix types, 4:1 ratio
- **General/balanced** (web APIs, microservices) → M-series, 4:1 ratio, default concurrency
- **General/balanced** (web APIs, microservices) → M-series, 2:1 ratio (default), default concurrency

Architecture: ARM (Graviton, g-suffix) for price-performance. x86 (i=Intel, a=AMD) when dependencies require it.

## Memory-to-vCPU Ratios

| Ratio | Profile | When to use | Memory examples |
| ----- | ------- | -------------------------- | --------------------- |
| 2:1 | Compute | CPU-bound work | 2GB/1vCPU, 4GB/2vCPU |
| 4:1 | General | Most workloads (default) | 4GB/1vCPU, 8GB/2vCPU |
| 8:1 | Memory | Caching, data, Python apps | 8GB/1vCPU, 16GB/2vCPU |
| Ratio | Profile | When to use | Memory examples |
| ----- | ------- | -------------------------------- | --------------------- |
| 2:1 | Compute | CPU-bound work (default) | 2GB/1vCPU, 4GB/2vCPU |
| 4:1 | General | Mixed CPU/memory-heavy workloads | 4GB/1vCPU, 8GB/2vCPU |
| 8:1 | Memory | Memory-heavy or Python apps | 8GB/1vCPU, 16GB/2vCPU |

Min: 2 GB / 1 vCPU. Max: 32 GB. Memory must align with ratio multiples.

Expand Down Expand Up @@ -51,6 +51,26 @@ Total capacity = MinExecutionEnvironments × PerExecutionEnvironmentMaxConcurren
| AllowedInstanceTypes | All | Restrict only for specific hardware needs |
| ExcludedInstanceTypes | None | Exclude expensive types in dev/test |

## Scheduled Scaling (Predictable Traffic)

For workloads with known traffic patterns (business hours, marketing events, batch windows), use [Amazon EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/managing-targets-universal.html) to adjust a function's `MinExecutionEnvironments` and `MaxExecutionEnvironments` on a one-time or recurring schedule. A schedule (cron or rate expression) targets the Lambda `PutFunctionScalingConfig` API as an EventBridge Scheduler universal target, passing new Min/Max values in the input payload.

**Behavior:**

- Scheduled scaling sets the provisioned floor and ceiling. Actual scaling between Min and Max still responds to CPU utilization and concurrency saturation.
- If traffic more than doubles within 5 minutes of a scheduled scale-up, you may still see throttles while capacity provisions.
- Setting both `MinExecutionEnvironments` and `MaxExecutionEnvironments` to 0 deactivates the function version (instances terminate). A deactivated function does NOT auto-recover — schedule a separate action with non-zero values to reactivate it.

**Common patterns:**

| Pattern | Scale-up schedule | Scale-down schedule |
| ---------------------- | ----------------------------------- | -------------------------------- |
| Business hours | Raise Min/Max before work starts | Lower Min/Max after hours |
| Marketing/launch event | Raise Min ahead of the campaign | Restore baseline after the event |
| Idle scale-to-zero | Reactivate (non-zero) before demand | Set Min=Max=0 when idle |

See [infrastructure-setup.md](infrastructure-setup.md) for the EventBridge Scheduler IAM role and `create-schedule` CLI examples.

## Monitoring Thresholds

- **CPU > 80%**: reduce concurrency or add vCPUs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,83 @@ Resources:
CapacityProviderArn: !GetAtt MyCP.Arn
```
## Scheduled Scaling (EventBridge Scheduler)
For predictable traffic, adjust `MinExecutionEnvironments`/`MaxExecutionEnvironments` on a schedule using [Amazon EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/managing-targets-universal.html). The schedule calls the Lambda `PutFunctionScalingConfig` API directly as a universal target — no Lambda code or extra glue required.

### 1. Scheduler execution role

Trust policy (allow EventBridge Scheduler to assume the role):

```json
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": { "Service": "scheduler.amazonaws.com" },
"Action": "sts:AssumeRole"
}]
}
```

Permissions (call `PutFunctionScalingConfig` on the target function):

```json
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "lambda:PutFunctionScalingConfig",
"Resource": "arn:aws:lambda:*:*:function:my-lmi-function"
}]
}
```

### 2. Create schedules

Scale up before peak (08:00 UTC daily):

```bash
aws scheduler create-schedule \
--name ScaleUpLmi \
--schedule-expression "cron(0 8 * * ? *)" \
--flexible-time-window '{"Mode": "OFF"}' \
--target '{
"Arn": "arn:aws:scheduler:::aws-sdk:lambda:PutFunctionScalingConfig",
"RoleArn": "arn:aws:iam::<account-id>:role/eventbridge-scheduler-role",
"Input": "{\"FunctionName\": \"my-lmi-function\", \"Qualifier\": \"$LATEST.PUBLISHED\", \"FunctionScalingConfig\": {\"MinExecutionEnvironments\": 100, \"MaxExecutionEnvironments\": 1000}}"
}'
```

Scale down after peak (18:00 UTC daily):

```bash
aws scheduler create-schedule \
--name ScaleDownLmi \
--schedule-expression "cron(0 18 * * ? *)" \
--flexible-time-window '{"Mode": "OFF"}' \
--target '{
"Arn": "arn:aws:scheduler:::aws-sdk:lambda:PutFunctionScalingConfig",
"RoleArn": "arn:aws:iam::<account-id>:role/eventbridge-scheduler-role",
"Input": "{\"FunctionName\": \"my-lmi-function\", \"Qualifier\": \"$LATEST.PUBLISHED\", \"FunctionScalingConfig\": {\"MinExecutionEnvironments\": 5, \"MaxExecutionEnvironments\": 20}}"
}'
```

Set both values to `0` to deactivate during idle periods; schedule a separate non-zero action to reactivate (a deactivated function does not auto-recover).

### Manual override

Update scaling limits directly at any time:

```bash
aws lambda put-function-scaling-config \
--function-name my-lmi-function \
--qualifier '$LATEST.PUBLISHED' \
--function-scaling-config MinExecutionEnvironments=5,MaxExecutionEnvironments=20
```

`MinExecutionEnvironments` and `MaxExecutionEnvironments` accept values from 0 to 15000 and must be set together. Setting them on `$LATEST.PUBLISHED` propagates to future published versions.

## Cleanup

```bash
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,41 @@
# LMI Troubleshooting

## Testing Phase: Monitor and Adjust

After deploying your LMI function with a test workload, check these metrics and adjust:

**Duration increased vs. existing function:**

- This indicates the concurrency estimations used during setup may be off. Investigate by:
- Checking ExecutionEnvironmentCPUUtilization and ExecutionEnvironmentMemoryUtilization for saturation
- Reducing PerExecutionEnvironmentMaxConcurrency to see if duration improves
- Reviewing instance types — switching to larger or more powerful instances may help if resources are constrained
- If reducing concurrency doesn't help, check throttle metrics below

**Low ExecutionEnvironmentCPUUtilization (below 10%):**

- Increase PerExecutionEnvironmentMaxConcurrency to improve utilization
- Or lower MemorySize to reduce vCPUs per execution environment
- If memory utilization is also high, increase ExecutionEnvironmentMemoryGiBPerVCpu ratio instead

**Ongoing CPUThrottles:**

- Switch capacity provider to Manual scaling mode with a lower CPU utilization target (e.g., 25%)

**Ongoing MemoryThrottles:**

- Increase MemorySize
- To maintain the same vCPU count, adjust ratio proportionally (e.g., 4GB/2:1 → 8GB/4:1 keeps 2 vCPUs)

**Ongoing DiskThrottles:**

- Reduce per-invocation /tmp usage or reduce PerExecutionEnvironmentMaxConcurrency

**Ongoing ConcurrencyThrottles:**

- Increase PerExecutionEnvironmentMaxConcurrency (if CPU and memory have headroom)
- Check if MaxExecutionEnvironments or MaxVCpuCount is capping scale-out

## Common Issues

| Issue | Cause | Resolution |
Expand Down
Loading