diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index cd5026f0..680f1071 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -86,7 +86,7 @@ "microvms", "firecracker" ], - "version": "1.2.0" + "version": "1.3.0" }, { "category": "migration", diff --git a/plugins/aws-serverless/.claude-plugin/plugin.json b/plugins/aws-serverless/.claude-plugin/plugin.json index c6f5d87c..7fb5939a 100644 --- a/plugins/aws-serverless/.claude-plugin/plugin.json +++ b/plugins/aws-serverless/.claude-plugin/plugin.json @@ -22,5 +22,5 @@ "license": "Apache-2.0", "name": "aws-serverless", "repository": "https://github.com/awslabs/agent-plugins", - "version": "1.2.0" + "version": "1.3.0" } diff --git a/plugins/aws-serverless/.codex-plugin/plugin.json b/plugins/aws-serverless/.codex-plugin/plugin.json index 015e0503..4fe79445 100644 --- a/plugins/aws-serverless/.codex-plugin/plugin.json +++ b/plugins/aws-serverless/.codex-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "aws-serverless", - "version": "1.2.0", + "version": "1.3.0", "description": "Design, build, deploy, test, and debug serverless applications with AWS Serverless services.", "author": { "name": "Amazon Web Services", diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md index 4ea2189d..afa9cf5b 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md @@ -4,13 +4,14 @@ description: > Evaluate, configure, and migrate workloads to AWS Lambda Managed Instances (LMI). Triggers on: Lambda Managed Instances, LMI, capacity provider, multi-concurrency Lambda, dedicated instance Lambda, EC2-backed Lambda, cold start elimination, Graviton Lambda, - instance type for Lambda, Lambda cost optimization with Reserved Instances or Savings Plans. - Also trigger when users describe high-volume predictable workloads seeking cost savings, + instance type for Lambda, scheduled scaling for LMI, Lambda cost optimization with + Reserved Instances or Savings Plans. Also trigger when users describe high-volume + predictable workloads seeking cost savings, want to scale LMI capacity on a schedule, or compare Lambda vs EC2 for steady-state traffic. For standard Lambda without LMI, use the aws-lambda skill instead. argument-hint: "[describe your workload or what you need help with]" metadata: - tags: lambda, lmi, managed-instances, ec2, capacity-provider, multi-concurrency, cost-optimization + tags: lambda, lmi, managed-instances, ec2, capacity-provider, multi-concurrency, cost-optimization, scheduled-scaling --- # AWS Lambda Managed Instances (LMI) @@ -22,11 +23,11 @@ For standard Lambda development, see [aws-lambda skill](../aws-lambda/). For SAM ## When to Load Reference Files - **Cost comparison**, **pricing analysis**, **Lambda vs LMI cost**, **Savings Plans**, or **Reserved Instances** -> see [references/cost-comparison.md](references/cost-comparison.md) -- **Instance types**, **memory sizing**, **vCPU ratios**, **scaling tuning**, or **capacity provider config** -> see [references/configuration-guide.md](references/configuration-guide.md) +- **Instance types**, **memory sizing**, **vCPU ratios**, **scaling tuning**, **scheduled scaling**, or **capacity provider config** -> see [references/configuration-guide.md](references/configuration-guide.md) - **Thread safety**, **concurrency model**, **code review checklist**, **Powertools compatibility**, or **multi-concurrency readiness** -> see [references/thread-safety.md](references/thread-safety.md) - **Before/after code examples**, **runtime-specific migration** (Node.js, Python, Java, .NET), or **connection pooling** -> see [references/migration-patterns.md](references/migration-patterns.md) -- **IAM roles**, **VPC setup**, **CLI commands**, **SAM template**, or **CDK example** -> see [references/infrastructure-setup.md](references/infrastructure-setup.md) and [scripts/setup-lmi.sh](scripts/setup-lmi.sh) -- **Errors**, **throttling**, **debugging**, or **stuck deployments** -> see [references/troubleshooting.md](references/troubleshooting.md) +- **IAM roles**, **VPC setup**, **CLI commands**, **SAM template**, **CDK example**, or **scheduled scaling setup (EventBridge Scheduler)** -> see [references/infrastructure-setup.md](references/infrastructure-setup.md) and [scripts/setup-lmi.sh](scripts/setup-lmi.sh) +- **Errors**, **throttling**, **debugging**, **stuck deployments**, **tuning configuration**, or **adjusting after deployment** -> see [references/troubleshooting.md](references/troubleshooting.md) ## Quick Decision: Is LMI Right for This Workload? @@ -54,6 +55,38 @@ Gather these signals before recommending: 6. **Concurrency readiness**: Thread safety (Node.js/Java/.NET)? Shared `/tmp` paths? Per-invocation DB connections? 7. **VPC**: Already in a VPC? Private resource access needed? +#### Deriving LMI Configuration from Metrics + +If Lambda Insights is enabled on the function, use these metrics to calculate your starting configuration. If Lambda Insights is not enabled, suggest adding it to gather accurate workload data — but only proceed with the user's explicit confirmation, as adding the Insights layer may affect function performance or cold start times. + +To check if Lambda Insights is enabled, look for a LambdaInsightsExtension layer on the function. To add it, find the latest layer ARN for your region from the [Lambda Insights documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights-extension-versions.html) and attach the `CloudWatchLambdaInsightsExecutionRolePolicy` managed policy to the function's execution role. + +**Target max concurrency** (from `cpu_total_time` and `Duration`): + +``` +PerExecutionEnvironmentMaxConcurrency = floor((0.5 × Duration) / cpu_total_time) +``` + +This targets 50% CPU utilization at full concurrency, leaving headroom for scaling. + +**Memory allocation** (from `memory_utilization` and current memory): + +``` +MemorySize = min(32768, max(2048, MaxConcurrency × (memory_utilization / 100) × current_allocated_memory)) +``` + +This overestimates (assumes no shared base memory) but provides a safe starting point. The outer `min` caps the result at the 32 GB (32768 MB) LMI maximum. + +**Minimum execution environments** (from baseline `ConcurrentExecutions`): + +``` +MinExecutionEnvironments = max(3, ceil(baseline_concurrent_executions × 2 / MaxConcurrency)) +``` + +Targets 50% concurrency utilization to leave headroom for traffic bursts. + +**Without Lambda Insights:** Start with the runtime's default max concurrency, 2 GB memory, and MinExecutionEnvironments = 3. Adjust during testing. + ### Step 2: Build the Cost Comparison REQUIRED: Present a cost comparison before recommending LMI. Compare at minimum: @@ -71,12 +104,14 @@ For discount analysis (Savings Plans, Reserved Instances), refer users to the [A **Instance families** (~450 types): C-series (compute, .xlarge+), M-series (general, .large+), R-series (memory, .large+). ARM (Graviton) for best price-performance. -**Memory-to-vCPU ratios**: 2:1 (compute), 4:1 (general, default), 8:1 (memory). Min 2 GB, max 32 GB. +**Memory-to-vCPU ratios**: 2:1 (default, CPU-bound work), 4:1 (general/mixed workloads), 8:1 (memory-heavy or Python apps). Min 2 GB, max 32 GB. **Multi-concurrency defaults/vCPU**: Node.js 64, Java 32, .NET 32, Python 16. **Scaling**: MinExecutionEnvironments (default 3), MaxVCpuCount (default 400), TargetResourceUtilization. +**Scheduled scaling**: For predictable traffic (business hours, marketing events), use EventBridge Scheduler to adjust Min/Max execution environments on a one-time or recurring schedule — scale up before peak, scale down or to zero when idle. + See [references/configuration-guide.md](references/configuration-guide.md) for decision trees and detailed tuning. ### Step 4: Migrate the Code @@ -105,8 +140,8 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for ### Step 6: Validate and Cut Over 1. Deploy to a non-production environment first -2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate -3. Gradual traffic shift with weighted aliases (10% → 50% → 100%) +2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate. If you observe low CPU utilization or ongoing throttles, see [references/troubleshooting.md](references/troubleshooting.md) for metric-specific adjustment guidance. +3. Shift traffic to the LMI function (note: weighted alias shifting between LMI and non-LMI functions is not currently supported) 4. Compare costs after 1-2 weeks of production data 5. Decommission standard Lambda once stable @@ -114,7 +149,7 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for ### Configuration -- Do: Start with 4:1 ratio and runtime default concurrency +- Do: Start with 2:1 ratio and runtime default concurrency - Do: Use ARM (Graviton) unless x86 dependencies exist - Do: Let Lambda choose instance types unless specific hardware needed - Do: Set MaxVCpuCount to control cost ceiling @@ -125,7 +160,7 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for - Do: Start with I/O-heavy functions (benefit most from multi-concurrency; CPU-bound functions compete for same CPU) - Do: Review code for concurrency safety before attaching to capacity provider (thread safety for Node.js/Java/.NET; `/tmp` and memory for Python) -- Do: Use weighted aliases for gradual traffic shift +- Do: Plan traffic shifting strategy based on your invocation source (weighted alias shifting between LMI and non-LMI functions is not currently supported) - Do: Include request IDs in all log statements - Do: Initialize DB pools and SDK clients outside the handler - Do: Estimate total `/tmp` usage under max concurrency @@ -135,8 +170,10 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for ### Operations - Do: Set CloudWatch alarms on throttle rate > 1% and CPU > 80% +- Do: Use scheduled scaling (EventBridge Scheduler) for predictable traffic — raise Min/Max before peak periods and lower them (or scale to zero) when idle - Don't: Manually terminate LMI EC2 instances (delete the capacity provider instead) - Don't: Forget to publish a version — unpublished functions cannot run on LMI +- Don't: Rely on a deactivated (Min=Max=0) function to self-recover — schedule an explicit scale-up to reactivate it ## Limits Quick Reference @@ -172,7 +209,7 @@ REQUIRED: AWS credentials configured on the host machine. ### Regional Availability -Currently available: us-east-1, us-east-2, us-west-2, ap-northeast-1, eu-west-1. Expanding to all commercial regions soon. +Available in all commercial AWS Regions except Israel (Tel Aviv), Middle East (Bahrain), Middle East (UAE), and Asia Pacific (Auckland). Check the [Lambda Managed Instances documentation](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html) for the latest regional availability. @@ -204,12 +241,14 @@ Override: "use SAM" → SAM YAML, "use CloudFormation" → CloudFormation YAML. ### Unsupported Region -- State: "Lambda Managed Instances is not yet available in [region]" -- List available regions +- State: "Lambda Managed Instances is not available in [region]" +- Name the excluded regions: Israel (Tel Aviv), Middle East (Bahrain), Middle East (UAE), Asia Pacific (Auckland) +- Suggest the nearest supported region ## Resources - [Lambda Managed Instances Docs](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html) +- [Scaling LMI & Scheduled Scaling Docs](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-scaling.html) - [Introducing LMI (AWS Blog)](https://aws.amazon.com/blogs/aws/introducing-aws-lambda-managed-instances-serverless-simplicity-with-ec2-flexibility/) - [Build High-Performance Apps with LMI](https://aws.amazon.com/blogs/compute/build-high-performance-apps-with-aws-lambda-managed-instances/) - [Migrating Functions to LMI (AWS Blog)](https://aws.amazon.com/blogs/compute/migrating-your-functions-to-aws-lambda-managed-instances/) diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md index cc560c82..8e66a5be 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md @@ -5,17 +5,17 @@ - **CPU-intensive** (encoding, ML, compression) → C-series, 2:1 ratio, concurrency=1/vCPU - **Memory-intensive** (caching, large datasets) → R-series, 8:1 ratio - **Network-intensive** (streaming, data transfer) → Use AllowedInstanceTypes for n-suffix types, 4:1 ratio -- **General/balanced** (web APIs, microservices) → M-series, 4:1 ratio, default concurrency +- **General/balanced** (web APIs, microservices) → M-series, 2:1 ratio (default), default concurrency Architecture: ARM (Graviton, g-suffix) for price-performance. x86 (i=Intel, a=AMD) when dependencies require it. ## Memory-to-vCPU Ratios -| Ratio | Profile | When to use | Memory examples | -| ----- | ------- | -------------------------- | --------------------- | -| 2:1 | Compute | CPU-bound work | 2GB/1vCPU, 4GB/2vCPU | -| 4:1 | General | Most workloads (default) | 4GB/1vCPU, 8GB/2vCPU | -| 8:1 | Memory | Caching, data, Python apps | 8GB/1vCPU, 16GB/2vCPU | +| Ratio | Profile | When to use | Memory examples | +| ----- | ------- | -------------------------------- | --------------------- | +| 2:1 | Compute | CPU-bound work (default) | 2GB/1vCPU, 4GB/2vCPU | +| 4:1 | General | Mixed CPU/memory-heavy workloads | 4GB/1vCPU, 8GB/2vCPU | +| 8:1 | Memory | Memory-heavy or Python apps | 8GB/1vCPU, 16GB/2vCPU | Min: 2 GB / 1 vCPU. Max: 32 GB. Memory must align with ratio multiples. @@ -51,6 +51,26 @@ Total capacity = MinExecutionEnvironments × PerExecutionEnvironmentMaxConcurren | AllowedInstanceTypes | All | Restrict only for specific hardware needs | | ExcludedInstanceTypes | None | Exclude expensive types in dev/test | +## Scheduled Scaling (Predictable Traffic) + +For workloads with known traffic patterns (business hours, marketing events, batch windows), use [Amazon EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/managing-targets-universal.html) to adjust a function's `MinExecutionEnvironments` and `MaxExecutionEnvironments` on a one-time or recurring schedule. A schedule (cron or rate expression) targets the Lambda `PutFunctionScalingConfig` API as an EventBridge Scheduler universal target, passing new Min/Max values in the input payload. + +**Behavior:** + +- Scheduled scaling sets the provisioned floor and ceiling. Actual scaling between Min and Max still responds to CPU utilization and concurrency saturation. +- If traffic more than doubles within 5 minutes of a scheduled scale-up, you may still see throttles while capacity provisions. +- Setting both `MinExecutionEnvironments` and `MaxExecutionEnvironments` to 0 deactivates the function version (instances terminate). A deactivated function does NOT auto-recover — schedule a separate action with non-zero values to reactivate it. + +**Common patterns:** + +| Pattern | Scale-up schedule | Scale-down schedule | +| ---------------------- | ----------------------------------- | -------------------------------- | +| Business hours | Raise Min/Max before work starts | Lower Min/Max after hours | +| Marketing/launch event | Raise Min ahead of the campaign | Restore baseline after the event | +| Idle scale-to-zero | Reactivate (non-zero) before demand | Set Min=Max=0 when idle | + +See [infrastructure-setup.md](infrastructure-setup.md) for the EventBridge Scheduler IAM role and `create-schedule` CLI examples. + ## Monitoring Thresholds - **CPU > 80%**: reduce concurrency or add vCPUs diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md index 83deaef3..e88a1f06 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md @@ -224,6 +224,83 @@ Resources: CapacityProviderArn: !GetAtt MyCP.Arn ``` +## Scheduled Scaling (EventBridge Scheduler) + +For predictable traffic, adjust `MinExecutionEnvironments`/`MaxExecutionEnvironments` on a schedule using [Amazon EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/managing-targets-universal.html). The schedule calls the Lambda `PutFunctionScalingConfig` API directly as a universal target — no Lambda code or extra glue required. + +### 1. Scheduler execution role + +Trust policy (allow EventBridge Scheduler to assume the role): + +```json +{ + "Version": "2012-10-17", + "Statement": [{ + "Effect": "Allow", + "Principal": { "Service": "scheduler.amazonaws.com" }, + "Action": "sts:AssumeRole" + }] +} +``` + +Permissions (call `PutFunctionScalingConfig` on the target function): + +```json +{ + "Version": "2012-10-17", + "Statement": [{ + "Effect": "Allow", + "Action": "lambda:PutFunctionScalingConfig", + "Resource": "arn:aws:lambda:*:*:function:my-lmi-function" + }] +} +``` + +### 2. Create schedules + +Scale up before peak (08:00 UTC daily): + +```bash +aws scheduler create-schedule \ + --name ScaleUpLmi \ + --schedule-expression "cron(0 8 * * ? *)" \ + --flexible-time-window '{"Mode": "OFF"}' \ + --target '{ + "Arn": "arn:aws:scheduler:::aws-sdk:lambda:PutFunctionScalingConfig", + "RoleArn": "arn:aws:iam:::role/eventbridge-scheduler-role", + "Input": "{\"FunctionName\": \"my-lmi-function\", \"Qualifier\": \"$LATEST.PUBLISHED\", \"FunctionScalingConfig\": {\"MinExecutionEnvironments\": 100, \"MaxExecutionEnvironments\": 1000}}" + }' +``` + +Scale down after peak (18:00 UTC daily): + +```bash +aws scheduler create-schedule \ + --name ScaleDownLmi \ + --schedule-expression "cron(0 18 * * ? *)" \ + --flexible-time-window '{"Mode": "OFF"}' \ + --target '{ + "Arn": "arn:aws:scheduler:::aws-sdk:lambda:PutFunctionScalingConfig", + "RoleArn": "arn:aws:iam:::role/eventbridge-scheduler-role", + "Input": "{\"FunctionName\": \"my-lmi-function\", \"Qualifier\": \"$LATEST.PUBLISHED\", \"FunctionScalingConfig\": {\"MinExecutionEnvironments\": 5, \"MaxExecutionEnvironments\": 20}}" + }' +``` + +Set both values to `0` to deactivate during idle periods; schedule a separate non-zero action to reactivate (a deactivated function does not auto-recover). + +### Manual override + +Update scaling limits directly at any time: + +```bash +aws lambda put-function-scaling-config \ + --function-name my-lmi-function \ + --qualifier '$LATEST.PUBLISHED' \ + --function-scaling-config MinExecutionEnvironments=5,MaxExecutionEnvironments=20 +``` + +`MinExecutionEnvironments` and `MaxExecutionEnvironments` accept values from 0 to 15000 and must be set together. Setting them on `$LATEST.PUBLISHED` propagates to future published versions. + ## Cleanup ```bash diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md index d1055c7b..91ffcbf1 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md @@ -1,5 +1,41 @@ # LMI Troubleshooting +## Testing Phase: Monitor and Adjust + +After deploying your LMI function with a test workload, check these metrics and adjust: + +**Duration increased vs. existing function:** + +- This indicates the concurrency estimations used during setup may be off. Investigate by: + - Checking ExecutionEnvironmentCPUUtilization and ExecutionEnvironmentMemoryUtilization for saturation + - Reducing PerExecutionEnvironmentMaxConcurrency to see if duration improves + - Reviewing instance types — switching to larger or more powerful instances may help if resources are constrained +- If reducing concurrency doesn't help, check throttle metrics below + +**Low ExecutionEnvironmentCPUUtilization (below 10%):** + +- Increase PerExecutionEnvironmentMaxConcurrency to improve utilization +- Or lower MemorySize to reduce vCPUs per execution environment +- If memory utilization is also high, increase ExecutionEnvironmentMemoryGiBPerVCpu ratio instead + +**Ongoing CPUThrottles:** + +- Switch capacity provider to Manual scaling mode with a lower CPU utilization target (e.g., 25%) + +**Ongoing MemoryThrottles:** + +- Increase MemorySize +- To maintain the same vCPU count, adjust ratio proportionally (e.g., 4GB/2:1 → 8GB/4:1 keeps 2 vCPUs) + +**Ongoing DiskThrottles:** + +- Reduce per-invocation /tmp usage or reduce PerExecutionEnvironmentMaxConcurrency + +**Ongoing ConcurrencyThrottles:** + +- Increase PerExecutionEnvironmentMaxConcurrency (if CPU and memory have headroom) +- Check if MaxExecutionEnvironments or MaxVCpuCount is capping scale-out + ## Common Issues | Issue | Cause | Resolution |