EC2 AutoScalingのScale-in時にどのインスタンスを終了するかAWS Lambdaで制御できるようになりました

2021.08.06

中山(順)@リカバリー中 です

EC2 AutoScalingには従来からTermination policyを設定することができました。 この機能はAWSによって事前定義済みのものを選択する仕様でしたが、先日のアップデートによりLambda関数を利用していかようにもできるようになりました。

Amazon EC2 Auto Scaling now lets you control which instances to terminate on scale-in

設定方法や動作を確認してみたいと思います。

やってみた

以下の流れでやってみます。

  • 事前準備(Launch TemplateおよびAutoScaling Groupの作成)
  • Lambda関数の作成
  • AutoScaling Groupの更新(Termination Policyの設定)

事前準備(Launch TemplateおよびAutoScaling Groupの作成)

まずはスケールインさせるAutoScaling Groupを作成します。 今回はあくまでもテストなので、ELBへの関連付けやUserDataによる初期設定などは何も行いません。

まず、Launch Templateを作成します。

LT_DATA_FILE_NAME='LaunchTemplateData.json'

cat << EOF > ${LT_DATA_FILE_NAME}
{
    "ImageId": "ami-09ebacdc178ae23b7",
    "InstanceType": "t3.nano"
}
EOF

aws ec2 create-launch-template \
    --launch-template-name test-lt-asg-termination \
    --launch-template-data file://${LT_DATA_FILE_NAME}
{
    "LaunchTemplate": {
        "LaunchTemplateId": "lt-0ef4832eb585a8e0b",
        "LaunchTemplateName": "test-lt-asg-termination",
        "CreateTime": "2021-08-06T10:16:20+00:00",
        "CreatedBy": "arn:aws:sts::xxxxxxxxxxxx:assumed-role/cm-nakayama.nobuhiro/cm-nakayama.nobuhiro",
        "DefaultVersionNumber": 1,
        "LatestVersionNumber": 1
    }
}

AutoScaling Groupを作成します。 今回はDefault VPCに3つのEC2インスタンス(AZ毎に1インスタンス)を作成します。

aws autoscaling create-auto-scaling-group \
    --auto-scaling-group-name test-asg-termination \
    --launch-template "LaunchTemplateId=lt-0ef4832eb585a8e0b" \
    --min-size 0 \
    --max-size 6 \
    --desired-capacity 3 \
    --vpc-zone-identifier "subnet-0b33c6ab5dbe1dda5,subnet-09166ccaa28459f22,subnet-03267eb84db1ec870"

Lambda関数の作成

次に、どのインスタンスを終了させるか制御するLambda関数を作成します。

作成にあたり、入力データと応答データのフォーマットを公式ドキュメントで確認しましょう。

Input data

{
  "AutoScalingGroupARN": "arn:aws:autoscaling:us-east-1:<account-id>:autoScalingGroup:d4738357-2d40-4038-ae7e-b00ae0227003:autoScalingGroupName/my-asg",
  "AutoScalingGroupName": "my-asg",
  "CapacityToTerminate": [
    {
      "AvailabilityZone": "us-east-1b",
      "Capacity": 2,
      "InstanceMarketOption": "OnDemand"
    },
    {
      "AvailabilityZone": "us-east-1b",
      "Capacity": 1,
      "InstanceMarketOption": "Spot"
    },
    {
      "AvailabilityZone": "us-east-1c",
      "Capacity": 3,
      "InstanceMarketOption": "OnDemand"
    }
  ],
  "Instances": [
    {
      "AvailabilityZone": "us-east-1b",
      "InstanceId": "i-0056faf8da3e1f75d",
      "InstanceType": "t2.nano",
      "InstanceMarketOption": "OnDemand"
    },
    {
      "AvailabilityZone": "us-east-1c",
      "InstanceId": "i-02e1c69383a3ed501",
      "InstanceType": "t2.nano",
      "InstanceMarketOption": "OnDemand"
    },
    {
      "AvailabilityZone": "us-east-1c",
      "InstanceId": "i-036bc44b6092c01c7",
      "InstanceType": "t2.nano",
      "InstanceMarketOption": "OnDemand"
    },
    ...
  ],
  "Cause": "SCALE_IN"
}

Response data

{
  "InstanceIDs": [
    "i-02e1c69383a3ed501",
    "i-036bc44b6092c01c7",
    ...
  ]
}

これを踏まえてポリシーを実装します。 今回は特定AZのインスタンスを削除するものとします。

import json

def lambda_handler(event, context):
    instance_list = []
    
    for instance in event['Instances']:
        if instance['AvailabilityZone'] == 'ap-northeast-1a':
            instance_list.append(instance['InstanceId'])

    response = {'InstanceIDs': instance_list}
    
    print(response)

    return response
LAMBDA_CODE_FILE_NAME='lambda_function.py'

cat << EOF > ${LAMBDA_CODE_FILE_NAME}
import json

def lambda_handler(event, context):
    instance_list = []
    
    for instance in event['Instances']:
        if instance['AvailabilityZone'] == 'ap-northeast-1a':
            instance_list.append(instance['InstanceId'])

    response = {'InstanceIDs': instance_list}
    
    print(response)

    return response
EOF

zip test-func-asg-termination.zip ${LAMBDA_CODE_FILE_NAME}
  adding: lambda_function.py (deflated 44%)

Lambda関数のExecution Roleを作成します。

TRUSTED_POLICY_FILE_NAME='trusted-policy.json'

cat << EOF > ${TRUSTED_POLICY_FILE_NAME}
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

aws iam create-role \
    --role-name asg-termination-execution-role \
    --assume-role-policy-document file://${TRUSTED_POLICY_FILE_NAME}
{
    "Role": {
        "Path": "/",
        "RoleName": "asg-termination-execution-role",
        "RoleId": "AROAXS3RGICAXZBV2PQNA",
        "Arn": "arn:aws:iam::xxxxxxxxxxxx:role/asg-termination-execution-role",
        "CreateDate": "2021-08-06T10:57:31+00:00",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "lambda.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }
    }
}

権限はログの出力のみとします(ここは必要に応じて調整してください)。

INLINE_POLICY_FILE_NAME='inline-policy.json'

cat << EOF > ${INLINE_POLICY_FILE_NAME}
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "arn:aws:logs:ap-northeast-1:xxxxxxxxxxxx:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:ap-northeast-1:xxxxxxxxxxxx:log-group:/aws/lambda/test-func-asg-termination:*"
            ]
        }
    ]
}
EOF

aws iam put-role-policy \
    --role-name asg-termination-execution-role \
    --policy-name asg-termination-execution-policy \
    --policy-document file://${INLINE_POLICY_FILE_NAME}

aws iam get-role-policy \
    --role-name asg-termination-execution-role \
    --policy-name asg-termination-execution-policy
{
    "RoleName": "asg-termination-execution-role",
    "PolicyName": "asg-termination-execution-policy",
    "PolicyDocument": {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "logs:CreateLogGroup",
                "Resource": "arn:aws:logs:ap-northeast-1:xxxxxxxxxxxx:*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
                "Resource": [
                    "arn:aws:logs:ap-northeast-1:xxxxxxxxxxxx:log-group:/aws/lambda/test-func-asg-termination:*"
                ]
            }
        ]
    }
}

必要なリソースができたので、Lambda関数を作成します。

aws lambda create-function \
    --function-name test-func-asg-termination \
    --runtime python3.8 \
    --zip-file fileb://test-func-asg-termination.zip \
    --handler lambda_function.lambda_handler \
    --role arn:aws:iam::xxxxxxxxxxxx:role/asg-termination-execution-role
{
    "FunctionName": "test-func-asg-termination",
    "FunctionArn": "arn:aws:lambda:ap-northeast-1:xxxxxxxxxxxx:function:test-func-asg-termination",
    "Runtime": "python3.8",
    "Role": "arn:aws:iam::xxxxxxxxxxxx:role/asg-termination-execution-role",
    "Handler": "lambda_function.lambda_handler",
    "CodeSize": 371,
    "Description": "",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2021-08-06T11:05:45.097+0000",
    "CodeSha256": "I/rnY7IgxIpG52ZNJwIB5lKfBJII18aGLTEW5wz8ZFI=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "cbeb2eb3-47dd-4ce7-8cf9-3b9618e93892",
    "State": "Active",
    "LastUpdateStatus": "Successful",
    "PackageType": "Zip"
}

最初のバージョンを発行します。

aws lambda publish-version \
    --function-name test-func-asg-termination
{
    "FunctionName": "test-func-asg-termination",
    "FunctionArn": "arn:aws:lambda:ap-northeast-1:xxxxxxxxxxxx:function:test-func-asg-termination:1",
    "Runtime": "python3.8",
    "Role": "arn:aws:iam::xxxxxxxxxxxx:role/asg-termination-execution-role",
    "Handler": "lambda_function.lambda_handler",
    "CodeSize": 371,
    "Description": "",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2021-08-06T11:05:45.097+0000",
    "CodeSha256": "I/rnY7IgxIpG52ZNJwIB5lKfBJII18aGLTEW5wz8ZFI=",
    "Version": "1",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "5fd7acc5-b4cc-4b49-9be1-2f88fcf687ff",
    "State": "Active",
    "LastUpdateStatus": "Successful",
    "PackageType": "Zip"
}

EC2 AutoScalingのService RoleからLambda関数を実行できるようにLambda関数に対してアクセス許可を行います。

aws lambda add-permission \
    --function-name test-func-asg-termination:1 \
    --action lambda:InvokeFunction \
    --statement-id AllowInvokeByAutoScaling \
    --principal arn:aws:iam::xxxxxxxxxxxx:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling
{
    "Statement": "{\"Sid\":\"AllowInvokeByAutoScaling\",\"Effect\":\"Allow\",\"Principal\":{\"AWS\":\"arn:aws:iam::xxxxxxxxxxxx:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling\"},\"Action\":\"lambda:InvokeFunction\",\"Resource\":\"arn:aws:lambda:ap-northeast-1:xxxxxxxxxxxx:function:test-func-asg-termination:1\"}"
}

AutoScaling Groupの更新(Termination Policyの設定)

最後にAutoScaling GroupのTermination PolicyとしてLambda関数を指定します。

aws autoscaling update-auto-scaling-group \
    --auto-scaling-group-name test-asg-termination \
    --termination-policies arn:aws:lambda:ap-northeast-1:xxxxxxxxxxxx:function:test-func-asg-termination:1

これで設定は完了です。(TerminationPoliciesにLambda関数のARNが設定されています)

aws autoscaling describe-auto-scaling-groups \
    --auto-scaling-group-names test-asg-termination
{
    "AutoScalingGroups": [
        {
            "AutoScalingGroupName": "test-asg-termination",
            "AutoScalingGroupARN": "arn:aws:autoscaling:ap-northeast-1:xxxxxxxxxxxx:autoScalingGroup:4c38d16b-ae38-4dd9-a1c5-54beb344474b:autoScalingGroupName/test-asg-termination",
            "LaunchTemplate": {
                "LaunchTemplateId": "lt-0ef4832eb585a8e0b",
                "LaunchTemplateName": "test-lt-asg-termination"
            },
            "MinSize": 0,
            "MaxSize": 6,
            "DesiredCapacity": 3,
            "DefaultCooldown": 300,
            "AvailabilityZones": [
                "ap-northeast-1a",
                "ap-northeast-1c",
                "ap-northeast-1d"
            ],
            "LoadBalancerNames": [],
            "TargetGroupARNs": [],
            "HealthCheckType": "EC2",
            "HealthCheckGracePeriod": 0,
            "Instances": [
                {
                    "InstanceId": "i-061725f0f93ea3d82",
                    "InstanceType": "t3.nano",
                    "AvailabilityZone": "ap-northeast-1c",
                    "LifecycleState": "InService",
                    "HealthStatus": "Healthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0ef4832eb585a8e0b",
                        "LaunchTemplateName": "test-lt-asg-termination",
                        "Version": "1"
                    },
                    "ProtectedFromScaleIn": false
                },
                {
                    "InstanceId": "i-0f66b306bf8d98267",
                    "InstanceType": "t3.nano",
                    "AvailabilityZone": "ap-northeast-1d",
                    "LifecycleState": "InService",
                    "HealthStatus": "Healthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0ef4832eb585a8e0b",
                        "LaunchTemplateName": "test-lt-asg-termination",
                        "Version": "1"
                    },
                    "ProtectedFromScaleIn": false
                },
                {
                    "InstanceId": "i-0ff32820dd47545d9",
                    "InstanceType": "t3.nano",
                    "AvailabilityZone": "ap-northeast-1a",
                    "LifecycleState": "InService",
                    "HealthStatus": "Healthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0ef4832eb585a8e0b",
                        "LaunchTemplateName": "test-lt-asg-termination",
                        "Version": "1"
                    },
                    "ProtectedFromScaleIn": false
                }
            ],
            "CreatedTime": "2021-08-06T10:25:47.821000+00:00",
            "SuspendedProcesses": [],
            "VPCZoneIdentifier": "subnet-03267eb84db1ec870,subnet-0b33c6ab5dbe1dda5,subnet-09166ccaa28459f22",
            "EnabledMetrics": [],
            "Tags": [],
            "TerminationPolicies": [
                "arn:aws:lambda:ap-northeast-1:xxxxxxxxxxxx:function:test-func-asg-termination:1"
            ],
            "NewInstancesProtectedFromScaleIn": false,
            "ServiceLinkedRoleARN": "arn:aws:iam::xxxxxxxxxxxx:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
        }
    ]
}

動作確認

それでは動作を確認します。 現時点でDesiredCapacityは3で、これを手動で2にすることでスケールインさせます。

今回実装したLambda関数ではap-northeast-1aに存在するインスタンスのリストを返すので "i-0ff32820dd47545d9" が終了するはずです(インスタンスのIDは直前で実行したコマンドの結果を確認できます)。

aws autoscaling update-auto-scaling-group \
    --auto-scaling-group-name test-asg-termination \
    --desired-capacity 2

アクティビティを確認します。 想定した通りのインスタンスが削除されていることを確認できました。

aws autoscaling describe-scaling-activities \
    --auto-scaling-group-name test-asg-termination \
    --query "sort_by(Activities, &StartTime)[-1]"
{
    "ActivityId": "ca55ec6d-157e-071e-5f53-4bb4518c83c9",
    "AutoScalingGroupName": "test-asg-termination",
    "Description": "Terminating EC2 instance: i-0ff32820dd47545d9",
    "Cause": "At 2021-08-06T11:42:16Z an instance was taken out of service in response to a difference between desired and actual capacity, shrinking the capacity from 3 to 2.  At 2021-08-06T11:42:16Z instance i-0ff32820dd47545d9 was selected for termination.",
    "StartTime": "2021-08-06T11:42:16.449000+00:00",
    "EndTime": "2021-08-06T11:42:58+00:00",
    "StatusCode": "Successful",
    "Progress": 100,
    "Details": "{\"Subnet ID\":\"subnet-09166ccaa28459f22\",\"Availability Zone\":\"ap-northeast-1a\"}",
    "AutoScalingGroupARN": "arn:aws:autoscaling:ap-northeast-1:xxxxxxxxxxxx:autoScalingGroup:4c38d16b-ae38-4dd9-a1c5-54beb344474b:autoScalingGroupName/test-asg-termination"
}

応答データも想定通りです。

aws logs get-log-events \
    --log-group-name /aws/lambda/test-func-asg-termination \
    --log-stream-name 2021/08/06/[2]3bac35a4c16a4fbdae3a3853b973d40a
{
    "events": [
        {
            "timestamp": 1628250136258,
            "message": "START RequestId: c6b451f4-92c7-44ca-9eca-e69ca22ae6cb Version: 2\n",
            "ingestionTime": 1628250138227
        },
        {
            "timestamp": 1628250136259,
            "message": "{'InstanceIDs': ['i-0ff32820dd47545d9']}\n",
            "ingestionTime": 1628250138227
        },
        {
            "timestamp": 1628250136260,
            "message": "END RequestId: c6b451f4-92c7-44ca-9eca-e69ca22ae6cb\n",
            "ingestionTime": 1628250138227
        },
        {
            "timestamp": 1628250136260,
            "message": "REPORT RequestId: c6b451f4-92c7-44ca-9eca-e69ca22ae6cb\tDuration: 1.35 ms\tBilled Duration: 2 ms\tMemory Size: 128 MB\tMax Memory Used: 51 MB\tInit Duration: 116.42 ms\t\n",
            "ingestionTime": 1628250138227
        }
    ],
    "nextForwardToken": "f/36311191408207027883956676175990766424938000306047090691",
    "nextBackwardToken": "b/36311191408162426393559614929707694988392703583035129856"
}

動作確認は以上です。

なお、ドキュメントにいくつか気になる考慮事項および制約事項がありました。 ご利用前に必ず確認しましょう。

Considerations when using a custom termination policy

Limitations

まとめ

AutoScalingのスケールインについて、これまでよりも柔軟な運用ができるようになりました。 とはいうものの、この類の設定は維持管理がめんどくさいので用法用量を守って正しくお使いください。