AWS Step FunctionsとSSM RunCommandでWebシステムの起動・停止のジョブネットを組んでみた
こんにちは、のんピ です。
皆さんはジョブ管理システムから抜け出したいと思ったことはありますか? 私は常に思っています。
ジョブ管理システムとは、バッチ処理やOSの起動の一つ一つの処理をジョブとして、制御・運用をするシステムです。 ジョブ管理システムを使うことによって、定型業務を自動化するなどのメリットがあります。
しかし、私が思うに、ジョブ管理システムが便利だからこその辛みもあると思っています。 私が感じるジョブ管理システムの辛いところを以下にまとめます。
- ジョブ管理システムで全てのシステムのジョブネットを管理しているがために、ジョブ管理システムのメンテナンスが大変
- ジョブ管理システムが停止すると、全てのシステムに影響があるため、高い可用性が求められる
- ジョブ管理システムによっては、エージェント毎にライセンスの購入が必要になり、大量のクライアントがいる場合は、課金が発生する
と思ったので、AWS Step FunctionsとSSM RunCommandを使って、ジョブ管理システムの代替ができるかどうか検証してみました。
AWS Step FunctionsとSSM RunCommandでジョブ管理システムの代替はできるかもしれないが、移行にはそれなりの覚悟が必要
AWS Step FunctionsとSSM RunCommand構成の良いと思ったポイント
- AWS Step FunctionsとSSM RunCommandを使うことでサーバーレスでEC2インスタンス内のスクリプトの制御などの処理が可能
- ステートマシン(ジョブ管理システムで言うところのジョブネット)の稼働状況をコンソールで確認可能
- 複数の処理を並行して実行することが可能
- 処理の実行結果をCloudWatch Logsに出力可能
- EventBridgeでStep Functionsが正常終了もしくは、異常終了した場合に通知することが可能
AWS Step FunctionsとSSM RunCommand構成の難しいと思ったポイント
- ステートマシンを設計する際は、ジョブ管理システムのようにGUIで部品を並べて設計することはできず、jsonもしくはyamlといったコードで記述する必要がある
- タスク(ジョブ管理システムで言うところのジョブ)が失敗した際、途中から再実行をすることはできないため、冪等性のある構成にすることが重要
- タスクに渡すパラメーターは事前にjsonで定義し、ステートマシン内でタスク間がどの様なパラメーターを受け渡しするのか把握する必要がある
ジョブ管理システムの代替をしていきたいと思うので、まずはジョブ管理システムが実装している機能についてまとめてみます。 一般的なジョブ管理システムが実装している機能としては、以下の様なものになります。
- ジョブのスケジューリング
- 定期実行
- イベント実行
- ジョブの状態監視
- ジョブの実行結果のログ保存
システムによってはジョブが失敗した際にジョブを自動で再実行する機能もあるかと思いますが、 今回は、上述した機能を全て満たせれば、代替できると判断したいと思います。
AWS Step FunctionsとSSM RunCommandの操作内容について説明します。
- Webシステムの停止処理
- Webシステムの起動処理
シナリオ1. Webシステムの停止処理
- ALBでメンテナンスページを設定する
- 以下の処理を2つのEC2インスタンスで並列に行う
- EC2インスタンス上で稼働しているApacheを停止させる
- Apacheの停止後、EC2インスタンスを停止させる
- EC2インスタンス停止後、EC2インスタンスのAMIを作成する
- 2台のEC2インスタンスのAMI作成処理開始後、DBクラスターのスナップショットを作成する
- DBクラスターのスナップショット作成完了後、DBクラスターを停止させる
※ 途中で処理に失敗した場合(Apacheの停止失敗など)は、異常終了させる。
シナリオ2. Webシステムの起動処理
- DBクラスターを起動させる
- 以下の処理を2つのEC2インスタンスで並列に行う
- EC2インスタンスを起動させる
- EC2インスタンス起動後、Apacheを起動させる
- 2台のEC2インスタンス上のApacheの起動完了後、ターゲットグループのヘルスチェックを確認する
- healthyであれば、メンテナンスページを削除する
- healthy以外であれば、再度ヘルスチェックを確認する
※ 途中で処理に失敗した場合(Apacheの起動失敗など)は、異常終了させる
今回も私は例に漏れず、AWS CDKを使って
- Webシステム
- Step Functions
AWS CDKのディレクトリ構成は以下の様になっています。
> tree . ├── .gitignore ├── .npmignore ├── .vscode │ └── launch.json ├── README.md ├── bin │ └── app.ts ├── cdk.context.json ├── cdk.json ├── jest.config.js ├── lib │ ├── lambda-function-stack.ts │ ├── webapp-stack.ts │ ├── webapp-start-statemachine-stack.ts │ └── webapp-stop-statemachine-stack.ts ├── package-lock.json ├── package.json ├── src │ ├── cloudWatch │ │ └── AmazonCloudWatch-linux.json │ ├── ec2 │ │ └── userDataAmazonLinux2.sh │ ├── html │ │ └── maintenance.html │ └── lambda │ └── functions │ ├── create-alb-rule.ts │ ├── delete-alb-rule.ts │ ├── describe-dbcluster.ts │ ├── describe-ec2instance.ts │ ├── describe-health-target.ts │ ├── describe-result-runcommand.ts │ ├── describe-status-ec2instance.ts │ ├── notice-slack.ts │ ├── package-lock.json │ ├── package.json │ ├── runcommand-ec2instance.ts │ ├── start-backup-job.ts │ ├── start-dbcluster.ts │ ├── start-ec2instance.ts │ ├── stop-dbcluster.ts │ └── stop-ec2instance.ts ├── test │ ├── app.test.ts └── tsconfig.json
: Webシステム自体のスタックLambdaFunctionsStack
: Lambda関数をまとめたスタックWebAppStopStateMachineStack
: Webシステム停止のステートマシンを定義したスタックWebAppStartStateMachineStack
: Webシステム起動のステートマシンを定義したスタック
#!/usr/bin/env node import * as cdk from "@aws-cdk/core"; import { WebAppStack } from "../lib/webapp-stack"; import { LambdaFunctionsStack } from "../lib/lambda-function-stack"; import { WebAppStartStateMachineStack } from "../lib/webapp-start-statemachine-stack"; import { WebAppStopStateMachineStack } from "../lib/webapp-stop-statemachine-stack"; const app = new cdk.App(); const webAppStack = new WebAppStack(app, "WebAppStack", { env: { region: process.env.CDK_DEPLOY_REGION || process.env.CDK_DEFAULT_REGION, }, }); const lambdaFunctionsStack = new LambdaFunctionsStack( app, "LambdaFunctionsStack" ); new WebAppStartStateMachineStack(app, "WebAppStartStateMachineStack", { startEc2InstanceFunction: lambdaFunctionsStack.startEc2InstanceFunction, describeStatusEc2InstanceFunction: lambdaFunctionsStack.describeStatusEc2InstanceFunction, runCommandEc2InstanceFunction: lambdaFunctionsStack.runCommandEc2InstanceFunction, describeResultRunCommandFunction: lambdaFunctionsStack.describeResultRunCommandFunction, startDbClusterFunction: lambdaFunctionsStack.startDbClusterFunction, describeDbClusterFunction: lambdaFunctionsStack.describeDbClusterFunction, describeHealthTargetFunction: lambdaFunctionsStack.describeHealthTargetFunction, deleteAlbRuleFunction: lambdaFunctionsStack.deleteAlbRuleFunction, noticeSlackFunction: lambdaFunctionsStack.noticeSlackFunction, }); new WebAppStopStateMachineStack(app, "WebAppStopStateMachineStack", { stopEc2InstanceFunction: lambdaFunctionsStack.stopEc2InstanceFunction, describeEc2InstanceFunction: lambdaFunctionsStack.describeEc2InstanceFunction, runCommandEc2InstanceFunction: lambdaFunctionsStack.runCommandEc2InstanceFunction, describeResultRunCommandFunction: lambdaFunctionsStack.describeResultRunCommandFunction, stopDbClusterFunction: lambdaFunctionsStack.stopDbClusterFunction, describeDbClusterFunction: lambdaFunctionsStack.describeDbClusterFunction, createAlbRuleFunction: lambdaFunctionsStack.createAlbRuleFunction, startBackupJobFunction: lambdaFunctionsStack.startBackupJobFunction, noticeSlackFunction: lambdaFunctionsStack.noticeSlackFunction, }); cdk.Tags.of(webAppStack).add("stackId", new cdk.ScopedAws(webAppStack).stackId); cdk.Tags.of(lambdaFunctionsStack).add( "stackId", new cdk.ScopedAws(lambdaFunctionsStack).stackId );
{ "name": "app", "version": "0.1.0", "bin": { "app": "bin/app.js" }, "scripts": { "build": "tsc", "watch": "tsc -w", "test": "jest", "cdk": "cdk" }, "devDependencies": { "@aws-cdk/assert": "1.104.0", "@types/jest": "^26.0.10", "@types/node": "10.17.27", "aws-cdk": "1.104.0", "jest": "^26.4.2", "ts-jest": "^26.2.0", "ts-node": "^9.0.0", "typescript": "~3.9.7" }, "dependencies": { "@aws-cdk/aws-ec2": "^1.103.0", "@aws-cdk/aws-elasticloadbalancingv2": "^1.104.0", "@aws-cdk/aws-events": "^1.103.0", "@aws-cdk/aws-events-targets": "^1.104.0", "@aws-cdk/aws-iam": "^1.103.0", "@aws-cdk/aws-lambda": "^1.103.0", "@aws-cdk/aws-lambda-nodejs": "^1.103.0", "@aws-cdk/aws-logs": "^1.103.0", "@aws-cdk/aws-rds": "^1.104.0", "@aws-cdk/aws-s3": "^1.104.0", "@aws-cdk/aws-secretsmanager": "^1.104.0", "@aws-cdk/aws-ssm": "^1.103.0", "@aws-cdk/aws-stepfunctions": "^1.103.0", "@aws-cdk/aws-stepfunctions-tasks": "^1.103.0", "@aws-cdk/core": "1.104.0", "fs": "^0.0.1-security", "source-map-support": "^0.5.16" } }
Webシステムのスタック: WebAppStack
- ALBのアクセスログ用のS3バケットの作成
- VPC Flow Logs用のCloudWatch Logsの作成
- VPC Flow Logs用のIAMポリシー、IAMロールの作成
- SSM用のIAMロールの作成
- VPCの作成
- Public Sunet、Private Subnet、Isolated Subnetをそれぞれ2つずつ作成
- NAT Gatewayを2つずつ作成
- 作成したVPCでVPC Flow Logsの有効化
- Security Groupの作成
- ALB用
- Webサーバー用
- DB用
- ALBの作成
- ALBのアクセスログの設定
- ターゲットグループ、リスナーの作成
- EC2インスタンスの作成
- EC2インスタンスはMulti-AZ構成になるように配置
- User Dataを使用してApache、PHPのインストール及び、Apacheの起動、停止、状態確認のスクリプトを作成
- 作成したEC2インスタンスをターゲットグループへ追加
- Secrets ManagerでDBクラスターに渡す認証情報を生成
- DBクラスターの作成
- EC2インスタンスに設定するCloudWatch Agentの設定ファイルをSSM パラメータストアにアップロード
import * as cdk from "@aws-cdk/core"; import * as s3 from "@aws-cdk/aws-s3"; import * as ec2 from "@aws-cdk/aws-ec2"; import * as logs from "@aws-cdk/aws-logs"; import * as iam from "@aws-cdk/aws-iam"; import * as elbv2 from "@aws-cdk/aws-elasticloadbalancingv2"; import * as rds from "@aws-cdk/aws-rds"; import * as secretsmanager from "@aws-cdk/aws-secretsmanager"; import * as ssm from "@aws-cdk/aws-ssm"; import * as fs from "fs"; export class WebAppStack extends cdk.Stack { constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) { super(scope, id, props); // Create S3 Bucket for ALB access log const albAccessLogBucket = new s3.Bucket(this, "AlbAccessLogBucket", { encryption: s3.BucketEncryption.S3_MANAGED, blockPublicAccess: new s3.BlockPublicAccess({ blockPublicAcls: true, blockPublicPolicy: true, ignorePublicAcls: true, restrictPublicBuckets: true, }), }); console.log(albAccessLogBucket.bucketRegionalDomainName); // Create CloudWatch Logs for VPC Flow Logs const flowLogsLogGroup = new logs.LogGroup(this, "FlowLogsLogGroup", { retention: logs.RetentionDays.ONE_WEEK, }); // Create VPC Flow Logs IAM role const flowLogsIamrole = new iam.Role(this, "FlowLogsIamrole", { assumedBy: new iam.ServicePrincipal("vpc-flow-logs.amazonaws.com"), }); // Create SSM IAM role const ssmIamRole = new iam.Role(this, "SsmIamRole", { assumedBy: new iam.ServicePrincipal("ec2.amazonaws.com"), managedPolicies: [ iam.ManagedPolicy.fromAwsManagedPolicyName( "AmazonSSMManagedInstanceCore" ), iam.ManagedPolicy.fromAwsManagedPolicyName("AmazonSSMPatchAssociation"), iam.ManagedPolicy.fromAwsManagedPolicyName( "CloudWatchAgentAdminPolicy" ), ], }); // Create VPC Flow Logs IAM Policy const flowLogsIamPolicy = new iam.Policy(this, "FlowLogsIamPolicy", { statements: [ new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["iam:PassRole"], resources: [flowLogsIamrole.roleArn], }), new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: [ "logs:CreateLogStream", "logs:PutLogEvents", "logs:DescribeLogStreams", ], resources: [flowLogsLogGroup.logGroupArn], }), ], }); // Atach VPC Flow Logs IAM Policy flowLogsIamrole.attachInlinePolicy(flowLogsIamPolicy); // Create VPC const vpc = new ec2.Vpc(this, "Vpc", { cidr: "", enableDnsHostnames: true, enableDnsSupport: true, natGateways: 2, maxAzs: 2, subnetConfiguration: [ { name: "Public", subnetType: ec2.SubnetType.PUBLIC, cidrMask: 24 }, { name: "Private", subnetType: ec2.SubnetType.PRIVATE, cidrMask: 24 }, { name: "Isolated", subnetType: ec2.SubnetType.ISOLATED, cidrMask: 24 }, ], }); // Setting VPC Flow Logs new ec2.CfnFlowLog(this, "FlowLogToLogs", { resourceId: vpc.vpcId, resourceType: "VPC", trafficType: "ALL", deliverLogsPermissionArn: flowLogsIamrole.roleArn, logDestination: flowLogsLogGroup.logGroupArn, logDestinationType: "cloud-watch-logs", logFormat: "${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status} ${vpc-id} ${subnet-id} ${instance-id} ${tcp-flags} ${type} ${pkt-srcaddr} ${pkt-dstaddr} ${region} ${az-id} ${sublocation-type} ${sublocation-id} ${pkt-src-aws-service} ${pkt-dst-aws-service} ${flow-direction} ${traffic-path}", maxAggregationInterval: 60, }); // Create Security Group // Security Group for ALB const albSg = new ec2.SecurityGroup(this, "AlbSg", { allowAllOutbound: true, vpc: vpc, }); albSg.addIngressRule( ec2.Peer.anyIpv4(), ec2.Port.tcp(80), "Allow inbound HTTP" ); // Security Group for Web const webSg = new ec2.SecurityGroup(this, "WebSg", { allowAllOutbound: true, vpc: vpc, }); webSg.addIngressRule(albSg, ec2.Port.tcp(80), "Allow web access from alb"); // Security Group for DB const dbSg = new ec2.SecurityGroup(this, "DbSg", { allowAllOutbound: true, vpc: vpc, }); dbSg.addIngressRule( webSg, ec2.Port.tcp(3306), "Allow db access from web server" ); // Create ALB const alb = new elbv2.ApplicationLoadBalancer(this, "Alb", { vpc: vpc, vpcSubnets: vpc.selectSubnets({ subnetGroupName: "Public" }), internetFacing: true, securityGroup: albSg, }); alb.logAccessLogs(albAccessLogBucket); // Create ALB Target group const targetGroup = new elbv2.ApplicationTargetGroup(this, "TargetGroup", { vpc: vpc, port: 80, protocol: elbv2.ApplicationProtocol.HTTP, targetType: elbv2.TargetType.INSTANCE, healthCheck: { path: "/phpinfo.php", healthyHttpCodes: "200", healthyThresholdCount: 2, interval: cdk.Duration.seconds(30), timeout: cdk.Duration.seconds(5), unhealthyThresholdCount: 2, }, }); // Create ALB listener const listener = alb.addListener("Listener", { port: 80, defaultTargetGroups: [targetGroup], }); // User data for Amazon Linux const userDataParameter = fs.readFileSync( "./src/ec2/userDataAmazonLinux2.sh", "utf8" ); const userDataAmazonLinux2 = ec2.UserData.forLinux({ shebang: "#!/bin/bash", }); userDataAmazonLinux2.addCommands(userDataParameter); // Create EC2 instance // AmazonLinux 2 vpc .selectSubnets({ subnetGroupName: "Private" }) .subnets.forEach((subnet, index) => { const ec2Instance = new ec2.Instance(this, `Ec2Instance${index}`, { machineImage: ec2.MachineImage.latestAmazonLinux({ generation: ec2.AmazonLinuxGeneration.AMAZON_LINUX_2, }), instanceType: new ec2.InstanceType("t3.micro"), vpc: vpc, keyName: this.node.tryGetContext("key-pair"), role: ssmIamRole, vpcSubnets: vpc.selectSubnets({ subnetGroupName: "Private", availabilityZones: [vpc.availabilityZones[index]], }), securityGroup: webSg, userData: userDataAmazonLinux2, }); targetGroup.addTarget( new elbv2.InstanceTarget(ec2Instance.instanceId, 80) ); }); // Create secrets const dbSecret = new secretsmanager.Secret(this, "DBSecret", { secretName: "WebApp/DBLoginInfo", generateSecretString: { excludeCharacters: ':@/" ', generateStringKey: "password", secretStringTemplate: '{"username": "admin"}', }, }); // Create DB Cluster new rds.DatabaseCluster(this, "DBCluster", { engine: rds.DatabaseClusterEngine.auroraMysql({ version: rds.AuroraMysqlEngineVersion.VER_2_09_1, }), instanceProps: { instanceType: ec2.InstanceType.of( ec2.InstanceClass.BURSTABLE3, ec2.InstanceSize.SMALL ), vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE, }, vpc: vpc, securityGroups: [dbSg], }, credentials: rds.Credentials.fromSecret(dbSecret), defaultDatabaseName: "testDB", storageEncrypted: true, cloudwatchLogsExports: ["error", "general", "slowquery", "audit"], }); // Read CloudWatch parameters for Linux const cloudWatchParameter = fs.readFileSync( "./src/cloudWatch/AmazonCloudWatch-linux.json", "utf8" ); // Create a new SSM Parameter new ssm.StringParameter(this, "CloudWatchParameter", { description: "CloudWatch parameters for Linux", parameterName: "AmazonCloudWatch-linux", stringValue: cloudWatchParameter, }); } }
User Data
では、User Dataと、CloudWatch Agentの設定を別ファイルに分けて読み込ませています。まずは、User Dataについてです。
User Dataでやっていることとしては以下の通りです。
- yumのアップデート
- CloudWatch Agentのメトリクス収集に必要なcollectdのインストール
- Apache、PHPのインストール
- Apacheがドキュメントルート配下のファイルを読み込めるように権限を変更
の作成- Apacheの起動スクリプトの作成
- 正常に起動できた場合は、
に正常に起動した内容のメッセージを出力し、exit 0
で、終了する - 正常に起動できなかった場合は、
に正常に起動できなかった内容のメッセージを出力し、exit 1
- 正常に起動できた場合は、
- Apacheの停止スクリプトの作成
- 正常に停止できた場合は、
に正常に停止した内容のメッセージを出力し、exit 0
で、終了する - 正常に停止できなかった場合は、
に正常に停止できなかった内容のメッセージを出力し、exit 1
- 正常に停止できた場合は、
- Apacheのステータス確認スクリプトの作成
- 停止している場合は、
に停止している内容のメッセージを出力し、exit 3
で、終了する - 起動している場合は、
に起動している内容のメッセージを出力し、exit 0
で、終了する - ステータスの確認が正常にできなかった場合は、
に正常に確認できなかった内容のメッセージを出力し、exit 1
- 停止している場合は、
- 作成したスクリプトを実行できるように権限を変更
- RunCommandは
exit 0
以外は異常終了として認識してしまうため、Step Functions側で終了コードを使って認識できるように、Apacheの実行状態によって終了コードを変えています。 - シェルスクリプトの変数を表す
# Install the necessary packages. yum update -y yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm yum install -y collectd yum install -y httpd amazon-linux-extras install -y php8.0 # Add ec2-user to the apache group. usermod -a -G apache ec2-user # Change the group ownership of /var/www and its contents to the apache group. chown -R ec2-user:apache /var/www # Change the directory permissions for /var/www and its subdirectories to set write permission for the group, and set the group ID for future subdirectories. chmod 2775 /var/www find /var/www -type d -exec chmod 2775 {} \; # Repeatedly change the file permissions for /var/www and its subdirectories to add group write permissions. find /var/www -type f -exec chmod 0664 {} \; echo "<?php phpinfo(); ?>" > /var/www/html/phpinfo.php # Script for starting httpd cat - << "EOF" > /usr/local/sbin/startHttpd.sh #!/bin/bash EXIT_CODE=0 TIME=`date +"%Y-%m-%dT%H:%M:%S.%3NZ"` systemctl start httpd EXIT_CODE=$? if [ $EXIT_CODE != 0 ] then echo "$TIME ERROR Failed to start httpd service by startHttpd.sh." >> /var/log/messages exit 1 else echo "$TIME INFO As a result of running startHttpd.sh, httpd.service started successfully." >> /var/log/messages exit 0 fi EOF # Script to stop httpd cat - << "EOF" > /usr/local/sbin/stopHttpd.sh #!/bin/bash EXIT_CODE=0 TIME=`date +"%Y-%m-%dT%H:%M:%S.%3NZ"` systemctl stop httpd EXIT_CODE=$? if [ $EXIT_CODE != 0 ] then echo "$TIME ERROR Stopping httpd.service by stopHttpd.sh failed." >> /var/log/messages exit 1 else echo "$TIME INFO As a result of running stopHttpd.sh, httpd.service was stopped successfully." >> /var/log/messages exit 0 fi EOF # Script for checking httpd status cat - << "EOF" > /usr/local/sbin/checkHttpd.sh #!/bin/bash EXIT_CODE=0 TIME=`date +"%Y-%m-%dT%H:%M:%S.%3NZ"` systemctl status httpd EXIT_CODE=$? if [ $EXIT_CODE = 3 ]; then echo "$TIME INFO The result of running checkHttpd.sh is that httpd.service is stopped." >> /var/log/messages exit 3 elif [ $EXIT_CODE != 0 ]; then echo "$TIME ERROR checkHttpd.sh could not be executed successfully." >> /var/log/messages exit 1 else echo "$TIME INFO As a result of running checkHttpd.sh, httpd.service was started." >> /var/log/messages exit 0 fi EOF # Grant execution privileges to shell scripts. chmod 744 /usr/local/sbin/startHttpd.sh chmod 744 /usr/local/sbin/stopHttpd.sh chmod 744 /usr/local/sbin/checkHttpd.sh
CloudWatch Agentの設定
続いて、CloudWatch Agentの設定についてです。
先のUser Dataでスクリプトを実行した際に/var/log/messages
ログ確認の度に、EC2インスタンスにログインするのも面倒なので、CloudWatch Logsに/var/log/messages
はroot権限でないと読み取れないので、CloudWatch Agentがrootで動くように設定しています。
{ "agent": { "metrics_collection_interval": 60, "run_as_user": "root" }, "logs": { "logs_collected": { "files": { "collect_list": [{ "file_path": "/var/log/messages", "log_group_name": "/var/log/messages", "log_stream_name": "{instance_id}" }, { "file_path": "/var/log/httpd/access_log", "log_group_name": "/var/log/httpd/access_log", "log_stream_name": "{instance_id}" }, { "file_path": "/var/log/httpd/error_log", "log_group_name": "/var/log/httpd/error_log", "log_stream_name": "{instance_id}" } ] } } }, "metrics": { "append_dimensions": { "AutoScalingGroupName": "${aws:AutoScalingGroupName}", "ImageId": "${aws:ImageId}", "InstanceId": "${aws:InstanceId}", "InstanceType": "${aws:InstanceType}" }, "metrics_collected": { "collectd": { "metrics_aggregation_interval": 60 }, "cpu": { "measurement": [ "cpu_usage_idle", "cpu_usage_iowait", "cpu_usage_user", "cpu_usage_system" ], "metrics_collection_interval": 60, "resources": [ "*" ], "totalcpu": false }, "disk": { "measurement": [ "used_percent", "inodes_free" ], "metrics_collection_interval": 60, "resources": [ "*" ] }, "diskio": { "measurement": [ "io_time", "write_bytes", "read_bytes", "writes", "reads" ], "metrics_collection_interval": 60, "resources": [ "*" ] }, "mem": { "measurement": [ "mem_used_percent" ], "metrics_collection_interval": 60 }, "netstat": { "measurement": [ "tcp_established", "tcp_time_wait" ], "metrics_collection_interval": 60 }, "statsd": { "metrics_aggregation_interval": 60, "metrics_collection_interval": 10, "service_address": ":8125" }, "swap": { "measurement": [ "swap_used_percent" ], "metrics_collection_interval": 60 } } } }
Lambda関数のスタック: LambdaFunctionsStack
ここでは、Webシステムの起動/停止のステートマシンで使用するLambda関数を定義しています。 コード自体は500行ぐらいありますが、やっていることを噛み砕くと以下の通り、ものすごく単純です。
- Lambda関数で使用するIAMロール、IAMポリシーの作成
- Lambda関数の定義
また、今回はLambda関数を記述するにあたって、AWS CDKと同じTypeScriptで書きたいと思ったので、aws-lambda-nodejs
を使用することで、cdk deploy
import * as cdk from "@aws-cdk/core"; import * as iam from "@aws-cdk/aws-iam"; import * as lambda from "@aws-cdk/aws-lambda"; import * as nodejs from "@aws-cdk/aws-lambda-nodejs"; export class LambdaFunctionsStack extends cdk.Stack { public readonly stopEc2InstanceFunction: nodejs.NodejsFunction; public readonly startEc2InstanceFunction: nodejs.NodejsFunction; public readonly describeStatusEc2InstanceFunction: nodejs.NodejsFunction; public readonly describeEc2InstanceFunction: nodejs.NodejsFunction; public readonly stopDbClusterFunction: nodejs.NodejsFunction; public readonly startDbClusterFunction: nodejs.NodejsFunction; public readonly describeDbClusterFunction: nodejs.NodejsFunction; public readonly runCommandEc2InstanceFunction: nodejs.NodejsFunction; public readonly describeResultRunCommandFunction: nodejs.NodejsFunction; public readonly createAlbRuleFunction: nodejs.NodejsFunction; public readonly deleteAlbRuleFunction: nodejs.NodejsFunction; public readonly describeHealthTargetFunction: nodejs.NodejsFunction; public readonly startBackupJobFunction: nodejs.NodejsFunction; public readonly noticeSlackFunction: nodejs.NodejsFunction; constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) { super(scope, id, props); // Declare AWS account ID and region. const { accountId, region } = new cdk.ScopedAws(this); // Create an IAM role for Lambda functions to operate EC2 instances. const lambdaEc2IamRole = new iam.Role(this, "LambdaEc2IamRole", { assumedBy: new iam.ServicePrincipal("lambda.amazonaws.com"), managedPolicies: [ iam.ManagedPolicy.fromAwsManagedPolicyName( "service-role/AWSLambdaBasicExecutionRole" ), ], }); // Create an IAM role for Lambda functions to operate DB cluster. const lambdaDbIamRole = new iam.Role(this, "LambdaDbIamRole", { assumedBy: new iam.ServicePrincipal("lambda.amazonaws.com"), managedPolicies: [ iam.ManagedPolicy.fromAwsManagedPolicyName( "service-role/AWSLambdaBasicExecutionRole" ), ], }); // Create an IAM role for the Lambda function to operate the SSM. const lambdaSsmIamRole = new iam.Role(this, "LambdaSsmIamRole", { assumedBy: new iam.ServicePrincipal("lambda.amazonaws.com"), managedPolicies: [ iam.ManagedPolicy.fromAwsManagedPolicyName( "service-role/AWSLambdaBasicExecutionRole" ), ], }); // Create an IAM role for the Lambda function to operate the ALB rule. const lambdaAlbIamRole = new iam.Role(this, "LambdaAlbIamRole", { assumedBy: new iam.ServicePrincipal("lambda.amazonaws.com"), managedPolicies: [ iam.ManagedPolicy.fromAwsManagedPolicyName( "service-role/AWSLambdaBasicExecutionRole" ), ], }); // Create an IAM role for the Lambda function to operate the AWS Backup. const lambdaBackupIamRole = new iam.Role(this, "LambdaBackupIamRole", { assumedBy: new iam.ServicePrincipal("lambda.amazonaws.com"), managedPolicies: [ iam.ManagedPolicy.fromAwsManagedPolicyName( "service-role/AWSLambdaBasicExecutionRole" ), ], }); // Create IAM policy for Lambda to operate EC2 instances. const lambdaEc2IamPolicy = new iam.Policy(this, "LambdaEc2IamPolicy", { statements: [ new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["iam:PassRole"], resources: [lambdaEc2IamRole.roleArn], }), new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["ec2:StartInstances", "ec2:StopInstances"], resources: [`arn:aws:ec2:${region}:${accountId}:instance/*`], }), new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["ec2:DescribeInstances", "ec2:DescribeInstanceStatus"], resources: ["*"], }), ], }); // Attach an IAM policy to the IAM role for Lambda to operate EC2 instances. lambdaEc2IamRole.attachInlinePolicy(lambdaEc2IamPolicy); // Create IAM policy for Lambda to operate DB cluster. const lambdaDbIamPolicy = new iam.Policy(this, "LambdaDbIamPolicy", { statements: [ new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["iam:PassRole"], resources: [lambdaDbIamRole.roleArn], }), new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: [ "rds:StartDBCluster", "rds:StopDBCluster", "rds:DescribeDBClusters", ], resources: [`arn:aws:rds:${region}:${accountId}:cluster:*`], }), ], }); // Attach an IAM policy to the IAM role for Lambda to operate DB cluster. lambdaDbIamRole.attachInlinePolicy(lambdaDbIamPolicy); // Create an IAM policy for the Lambda function to operate the SSM. const lambdaSsmIamPolicy = new iam.Policy(this, "LambdaSsmIamPolicy", { statements: [ new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["iam:PassRole"], resources: [lambdaSsmIamRole.roleArn], }), new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["ssm:SendCommand"], resources: [ `arn:aws:ssm:${region}:${accountId}:managed-instance/*`, `arn:aws:ssm:${region}:*:document/*`, `arn:aws:ec2:${region}:${accountId}:instance/*`, ], }), new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["ssm:ListCommandInvocations"], resources: ["*"], }), ], }); // Attach an IAM policy to the IAM policy for the Lambda function to operate the SSM. lambdaSsmIamRole.attachInlinePolicy(lambdaSsmIamPolicy); // Create an IAM policy for the Lambda function to operate the ALB. const lambdaAlbIamPolicy = new iam.Policy(this, "LambdaAlbIamPolicy", { statements: [ new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["iam:PassRole"], resources: [lambdaAlbIamRole.roleArn], }), new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: [ "elasticloadbalancing:DeleteRule", "elasticloadbalancing:CreateRule", ], resources: [ `arn:aws:elasticloadbalancing:${region}:${accountId}:listener-rule/net/*/*/*/*`, `arn:aws:elasticloadbalancing:${region}:${accountId}:listener-rule/app/*/*/*/*`, `arn:aws:elasticloadbalancing:${region}:${accountId}:listener/net/*/*/*`, `arn:aws:elasticloadbalancing:${region}:${accountId}:listener/app/*/*/*`, ], }), new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: [ "elasticloadbalancing:DeleteRule", "elasticloadbalancing:DescribeTargetHealth", ], resources: ["*"], }), ], }); // Attach an IAM policy to the IAM policy for the Lambda function to operate the ALB. lambdaAlbIamRole.attachInlinePolicy(lambdaAlbIamPolicy); // Create an IAM policy for the Lambda function to operate the ALB. const lambdaBackupIamPolicy = new iam.Policy( this, "LambdaBackupIamPolicy", { statements: [ new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["iam:PassRole"], resources: [lambdaAlbIamRole.roleArn], }), new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["iam:PassRole"], resources: [ `arn:aws:iam::${accountId}:role/service-role/AWSBackupDefaultServiceRole`, ], }), new iam.PolicyStatement({ effect: iam.Effect.ALLOW, actions: ["backup:StartBackupJob"], resources: [`arn:aws:backup:${region}:${accountId}:backup-vault:*`], }), ], } ); // Attach an IAM policy to the IAM policy for the Lambda function to operate the ALB. lambdaBackupIamRole.attachInlinePolicy(lambdaBackupIamPolicy); // Declaring Lambda functions // Lambda function for stopping EC2 instances this.stopEc2InstanceFunction = new nodejs.NodejsFunction( this, "StopEc2InstanceFunction", { entry: "src/lambda/functions/stop-ec2instance.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaEc2IamRole, } ); // Lambda function for starting EC2 instances this.startEc2InstanceFunction = new nodejs.NodejsFunction( this, "StartEc2InstanceFunction", { entry: "src/lambda/functions/start-ec2instance.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaEc2IamRole, } ); // Lambda function for checking the status of EC2 instances this.describeStatusEc2InstanceFunction = new nodejs.NodejsFunction( this, "DescribeStatusEc2InstanceFunction", { entry: "src/lambda/functions/describe-status-ec2instance.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaEc2IamRole, } ); // Lambda function for checking the status of EC2 instances this.describeEc2InstanceFunction = new nodejs.NodejsFunction( this, "DescribeEc2InstanceFunction", { entry: "src/lambda/functions/describe-ec2instance.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaEc2IamRole, } ); // Lambda function for stopping DB cluster this.stopDbClusterFunction = new nodejs.NodejsFunction( this, "StopDbClusterFunction", { entry: "src/lambda/functions/stop-dbcluster.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaDbIamRole, } ); // Lambda function for starting DB cluster this.startDbClusterFunction = new nodejs.NodejsFunction( this, "StartDbClusterceFunction", { entry: "src/lambda/functions/start-dbcluster.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaDbIamRole, } ); // Lambda function for checking the status of DB cluster this.describeDbClusterFunction = new nodejs.NodejsFunction( this, "DescribeDbClusterFunction", { entry: "src/lambda/functions/describe-dbcluster.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaDbIamRole, } ); // Lambda Function for executing RunCommand this.runCommandEc2InstanceFunction = new nodejs.NodejsFunction( this, "RunCommandEc2InstanceFunction", { entry: "src/lambda/functions/runcommand-ec2instance.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaSsmIamRole, } ); // Lambda Function for checking the result of RunCommand execution. this.describeResultRunCommandFunction = new nodejs.NodejsFunction( this, "DescribeResultRunCommandFunction", { entry: "src/lambda/functions/describe-result-runcommand.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaSsmIamRole, } ); // Lambda function to create ALB rules. this.createAlbRuleFunction = new nodejs.NodejsFunction( this, "CreateAlbRuleFunction", { entry: "src/lambda/functions/create-alb-rule.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaAlbIamRole, } ); // Lambda function to delete ALB rules. this.deleteAlbRuleFunction = new nodejs.NodejsFunction( this, "DeleteAlbRuleFunction", { entry: "src/lambda/functions/delete-alb-rule.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaAlbIamRole, } ); // Lambda function to describe ALB Target health. this.describeHealthTargetFunction = new nodejs.NodejsFunction( this, "DescribeHealthTargetFunction", { entry: "src/lambda/functions/describe-health-target.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaAlbIamRole, } ); // Lambda function to start AWS Backup job. this.startBackupJobFunction = new nodejs.NodejsFunction( this, "StartBackupJobFunction", { entry: "src/lambda/functions/start-backup-job.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, }, role: lambdaBackupIamRole, } ); // Lambda Function for notifying Slack of the execution results of Step Functions. this.noticeSlackFunction = new nodejs.NodejsFunction( this, "NoticeSlackFunction", { entry: "src/lambda/functions/notice-slack.ts", runtime: lambda.Runtime.NODEJS_14_X, bundling: { minify: true, }, environment: { ACCOUNT_ID: accountId, HOOK_URL: this.node.tryGetContext("hook-url"), SLACK_USER_ID: this.node.tryGetContext("slack-user-id"), }, } ); } }
: ALBのリスナールールを作成するLambda関数delete-alb-rule.ts
: ALBのリスナールールを削除するLambda関数describe-dbcluster.ts
: DBクラスターの一覧を取得するLambda関数describe-ec2instance.ts
: EC2インスタンスの一覧を取得するLambda関数describe-health-target.ts
: ALBのターゲットグループの状態を確認するLambda関数describe-result-runcommand.ts
: SSM RunCommandの実行結果を取得するLambda関数describe-status-ec2instance.ts
: EC2インスタンスの状態を確認するLambda関数notice-slack.ts
: Slackにステートマシンの正常終了/異常終了を通知するLambda関数runcommand-ec2instance.ts
: SSM RunCommandを実行するLambda関数start-backup-job.ts
: AWS Backupのオンデマンドジョブを実行するLambda関数start-dbcluster.ts
: DBクラスターを起動するLambda関数start-ec2instance.ts
: EC2インスタンスを起動するLambda関数stop-dbcluster.ts
: DBクラスターを停止するLambda関数stop-ec2instance.ts
: EC2インスタンスを停止するLambda関数
のように、入力された値を元にAWS SDKの関数を実行して、呼び出し元にそのまま結果を返しているだけです。
import { ELBv2, AWSError } from "aws-sdk"; import { Context, Callback } from "aws-lambda"; const elbv2 = new ELBv2(); export function handler( event: ELBv2.CreateRuleInput, context: Context, callback: Callback ): ELBv2.CreateRuleOutput | AWSError | void { console.log(event); elbv2.createRule(event, (error, data) => { if (error) { console.log(error, error.stack); callback(error); return; } else { console.log(data); callback(null, data); return; } }); }
import { RDS, AWSError } from "aws-sdk"; import { Context, Callback } from "aws-lambda"; const rds = new RDS(); export function handler( event: RDS.DescribeDBClustersMessage, context: Context, callback: Callback ): RDS.DBClusterMessage | AWSError | void { console.log(event); rds.describeDBClusters(event, (error, data) => { if (error) { console.log(error, error.stack); callback(error); return; } else { console.log(data); callback(null, data); return; } }); }
このLambda関数は、SSM RunCommandを実行します。
Step FunctionsからLambda関数にパラメータを渡す際は、jsonで渡します。
import { SSM, AWSError } from "aws-sdk"; import { Context, Callback } from "aws-lambda"; const ssm = new SSM(); export function handler( event: SSM.SendCommandRequest, context: Context, callback: Callback ): SSM.SendCommandResult | AWSError | void { console.log(event); const sendCommandRequest: SSM.SendCommandRequest = { InstanceIds: event.InstanceIds, DocumentName: "AWS-RunShellScript", MaxConcurrency: "1", Parameters: event.Parameters, TimeoutSeconds: 60, }; ssm.sendCommand(sendCommandRequest, (error, data) => { if (error) { console.log(error, error.stack); callback(error); return; } else { console.log(data); callback(null, data); return; } }); }
このLambda関数では、SSM RunCommandの結果を出力します。
SSM RunCommandは実行して即時に結果が返ってくるのではなく、実行してしばらくしてから結果が返ってきます。 そのため、別途SSM RunCommandの結果を確認する関数が必要になるので作成しました。
オプションとして、Details: true
としています。デフォルトはDetails: false
Details: true
とすることで、SSM RunCommandで実行したコマンド実行の応答とコマンド出力を返してくれます。
import { SSM, AWSError } from "aws-sdk"; import { Context, Callback } from "aws-lambda"; const ssm = new SSM(); export function handler( event: SSM.SendCommandResult, context: Context, callback: Callback ): SSM.ListCommandInvocationsResult | AWSError | void { console.log(event); const listCommandInvocationsRequest: SSM.ListCommandInvocationsRequest = { CommandId: event.Command?.CommandId, Details: true, }; ssm.listCommandInvocations(listCommandInvocationsRequest, (error, data) => { if (error) { console.log(error, error.stack); callback(error); return; } else { console.log(data); callback(null, data); return; } }); }
このLambda関数は、Step Functionsの結果をSlackに通知します。
- 実行結果が、
だったら、実行した処理の結果を通知する - 実行結果が、
だったら、失敗したステートマシンのARNを通知する - 事前に定義しておいたSkack IDにメンションする
- 以下の理由から送信する文字数を2000文字として、それ以上の文字は切り捨てる
- 長すぎるメッセージを送っても、確認してもらえない
- そもそもSlackの文字数制限が3000文字程度で、それ以上のメッセージを送ろうとすると
400 Bad Request
import { Context, Callback } from "aws-lambda"; import { StepFunctions } from "aws-sdk"; import * as request from "request"; // Number of characters limit for slack const characterLimit = 2000; export function handler( event: any, context: Context, callback: Callback ): request.RequestCallback | void { console.log(event); const describeExecutionOutput: StepFunctions.DescribeExecutionOutput = { executionArn: event.detail.executionArn, stateMachineArn: event.detail.stateMachineArn, startDate: event.detail.startDate, status: event.detail.status, output: event.detail.output, }; const hookUrl = <string>process.env["HOOK_URL"]; const slackUserId = <string>process.env["SLACK_USER_ID"]; const sfnStatus = describeExecutionOutput.status; const sfnExecArn = describeExecutionOutput.stateMachineArn; const statusMessage = `status: *${sfnStatus}*\n`; let resultMessage: string = ""; // Determine the execution result status of Step Functions if (sfnStatus == "SUCCEEDED") { const sfnOutput = JSON.parse(<string>describeExecutionOutput.output); if (sfnOutput instanceof Array && sfnOutput.length > 1) { sfnOutput.forEach((sfnResult: any, index: number) => { resultMessage += `resulet ${index + 1}:\n \`\`\`${JSON.stringify( sfnResult.Output.Payload, null, 2 )}\`\`\`\n`; }); } else if (sfnOutput instanceof Array && sfnOutput.length == 1) { resultMessage = `resulet:\n \`\`\`${JSON.stringify( sfnOutput[0].Output.Payload, null, 2 )}\`\`\``; } else { resultMessage = `resulet:\n \`\`\`${JSON.stringify( sfnOutput.Output.Payload, null, 2 )}\`\`\``; } } else { resultMessage = `resulet:\n \`\`\`${sfnExecArn}\`\`\``; } // If it is more than 3000 characters, omit the end. if (resultMessage.length > characterLimit) { resultMessage = resultMessage.substr(0, characterLimit - 1); // If there is an odd number of three consecutive back-quotes indicating a block of code in Markdown, // add them, as they have been removed due to the character limit. if ((resultMessage.match(/```/g) || []).length / 2 != 0) { resultMessage += `\`\`\`\n\n~~~The following is omitted~~~`; } else { resultMessage += `\n\n~~~The following is omitted~~~`; } } const slackMessage = { blocks: [ { type: "section", text: { type: "mrkdwn", text: `<@${slackUserId}>`, }, }, { type: "divider", }, { type: "section", text: { type: "mrkdwn", text: statusMessage, }, }, { type: "section", text: { type: "mrkdwn", text: resultMessage, }, }, ], }; // Request parameters const options: request.RequiredUriUrl & request.CoreOptions = { url: hookUrl, headers: { "Content-type": "application/json", }, body: slackMessage, json: true, }; request.post(options, (error, response, body) => { if (!error && response.statusCode == 200) { callback(null, body); } else { console.log("error: " + response.statusCode); callback(error); } }); }
このLambda関数は、AWS Backupのオンデマンドジョブを実行します。
と同様に、AWS Backupのオンデマンドジョブの実行に必要な共通的なパラメーターについてはLambda関数側で設定しています。
import { Backup, AWSError } from "aws-sdk"; import { Context, Callback } from "aws-lambda"; const backup = new Backup(); export function handler( event: Backup.StartBackupJobInput, context: Context, callback: Callback ): Backup.StartBackupJobOutput | AWSError | void { console.log(event); const startBackupJobInput: Backup.StartBackupJobInput = { BackupVaultName: "Default", IamRoleArn: `arn:aws:iam::${process.env.ACCOUNT_ID}:role/service-role/AWSBackupDefaultServiceRole`, ResourceArn: event.ResourceArn, Lifecycle: { DeleteAfterDays: 3, }, }; backup.startBackupJob(startBackupJobInput, (error, data) => { if (error) { console.log(error, error.stack); callback(error); return; } else { console.log(data); callback(null, data); return; } }); }
Webシステム停止のステートマシンを定義したスタック: WebAppStopStateMachineStack
- ステートマシンの結果を保存するCloudWatch Logsの作成する
で作成したLambda関数や、条件分岐をタスクとして定義する- 定義したタスクとタスクを接続して、ステートマシンを定義する
- 各タスクへの
State Input
配下にjson形式で記述する $.Input
を上書きされないように、各タスクのState Output
配下に出力する- 停止対象が2台のEC2インスタンスと複数台存在しているので、
を使用して並列で停止処理を行うようにしている - 並列処理を抜けると、
State Input
タスクというタスクでjsonにフォーマットをしている - 並列処理を抜けると、
を参照する - DBクラスターのステータスがバックアップ中(
AWS CDKを使ったタスクとタスクの繋ぎ方については、以下の記事を参考にしました。
import * as cdk from "@aws-cdk/core"; import * as logs from "@aws-cdk/aws-logs"; import * as sfn from "@aws-cdk/aws-stepfunctions"; import * as tasks from "@aws-cdk/aws-stepfunctions-tasks"; import * as nodejs from "@aws-cdk/aws-lambda-nodejs"; import * as events from "@aws-cdk/aws-events"; import * as targets from "@aws-cdk/aws-events-targets"; interface WebAppStopStateMachineProps extends cdk.StackProps { stopEc2InstanceFunction: nodejs.NodejsFunction; describeEc2InstanceFunction: nodejs.NodejsFunction; runCommandEc2InstanceFunction: nodejs.NodejsFunction; describeResultRunCommandFunction: nodejs.NodejsFunction; stopDbClusterFunction: nodejs.NodejsFunction; describeDbClusterFunction: nodejs.NodejsFunction; createAlbRuleFunction: nodejs.NodejsFunction; startBackupJobFunction: nodejs.NodejsFunction; noticeSlackFunction: nodejs.NodejsFunction; } export class WebAppStopStateMachineStack extends cdk.Stack { constructor( scope: cdk.Construct, id: string, props: WebAppStopStateMachineProps ) { super(scope, id, props); // Get the string after the stack name in the stack id to append to the end of the Log Group name to make it unique. const stackId = new cdk.ScopedAws(this).stackId; const stackIdAfterStackName = cdk.Fn.select(2, cdk.Fn.split("/", stackId)); // Create CloudWatch Logs for Step Functions const webAppStopStateMachineLogGroup = new logs.LogGroup( this, "WebAppStopStateMachineLogGroup", { logGroupName: `/aws/vendedlogs/states/webAppStopStateMachineLogGroup-${stackIdAfterStackName}`, retention: logs.RetentionDays.ONE_WEEK, } ); // Step Functions Task // Task for stopping EC2 instances const stopEc2InstanceState = new tasks.LambdaInvoke( this, "StopEc2InstanceState", { inputPath: "$.InstanceId", resultPath: "$.Output", lambdaFunction: props.stopEc2InstanceFunction, } ); // Task for checking the status of EC2 instances const describeStatusEc2InstanceState = new tasks.LambdaInvoke( this, "DescribeStatusEc2InstanceState", { inputPath: "$.InstanceId", resultPath: "$.Output", lambdaFunction: props.describeEc2InstanceFunction, } ); // Task to run a RunCommand for "stop httpd" const stopHttpdState = new tasks.LambdaInvoke(this, "StopHttpdState", { inputPath: "$.StopHttpd", resultPath: "$.Output", lambdaFunction: props.runCommandEc2InstanceFunction, }); // Task to check the status of RunCommand for "stop httpd" const stopHttpdResultState = new tasks.LambdaInvoke( this, "StopHttpdResultState", { inputPath: "$.Output.Payload", resultPath: "$.Output", lambdaFunction: props.describeResultRunCommandFunction, } ); // Task to run a RunCommand for "status httpd" const describeHttpdState = new tasks.LambdaInvoke( this, "DescribeHttpdState", { inputPath: "$.CheckHttpdStatus", resultPath: "$.Output", lambdaFunction: props.runCommandEc2InstanceFunction, } ); // Task to check the status of RunCommand for "status httpd". const describeHttpdResultState = new tasks.LambdaInvoke( this, "DescribeHttpdResultState", { inputPath: "$.Output.Payload", resultPath: "$.Output", lambdaFunction: props.describeResultRunCommandFunction, } ); // Task for stopping DB Cluster const stopDbClusterState = new tasks.LambdaInvoke( this, "StopDbClusterState", { inputPath: "$$.Execution.Input.Input.TargetDbCluster.DBClusterIdentifier", resultPath: "$.Output", lambdaFunction: props.stopDbClusterFunction, } ); // Task for checking the status of an DB Cluster const describeStatusDbClusterState = new tasks.LambdaInvoke( this, "DescribeStatusDbClusterState", { inputPath: "$$.Execution.Input.Input.TargetDbCluster.DBClusterIdentifier", resultPath: "$.Output", lambdaFunction: props.describeDbClusterFunction, } ); // Task to check if DB Cluster status is available. const isCompletedBackupDbClusterState = new tasks.LambdaInvoke( this, "IsCompletedBackupDbClusterState", { inputPath: "$$.Execution.Input.Input.TargetDbCluster.DBClusterIdentifier", resultPath: "$.Output", lambdaFunction: props.describeDbClusterFunction, } ); // Task for create ALB rules const createAlbRuleState = new tasks.LambdaInvoke( this, "CreateAlbRuleState", { inputPath: "$.Input.TargetRule", resultPath: "$.Output", lambdaFunction: props.createAlbRuleFunction, } ); // Task for AWS Backup jobs for EC2 instance const startBackupJobEC2instanceState = new tasks.LambdaInvoke( this, "StartBackupJobEC2instanceState", { inputPath: "$.ResourceArn", resultPath: "$.Output", lambdaFunction: props.startBackupJobFunction, } ); // Task for AWS Backup jobs for DB cluster const startBackupJobDbClusterState = new tasks.LambdaInvoke( this, "StartBackupJobDbClusterState", { inputPath: "$$.Execution.Input.Input.TargetDbCluster.ResourceArn", resultPath: "$.Output", lambdaFunction: props.startBackupJobFunction, } ); // Task for 30-second wait const waitStopEc2Instance30Sec = new sfn.Wait( this, "WaitStopEc2Instance30Sec", { time: sfn.WaitTime.duration(cdk.Duration.seconds(30)), } ); // Task for 60-second wait const waitStopDbCluster60Sec = new sfn.Wait( this, "waitStopDbCluster60Sec", { time: sfn.WaitTime.duration(cdk.Duration.seconds(30)), } ); // Task for 10-second wait const waitStopHttpd10Sec = new sfn.Wait(this, "WaitStopHttpd10Sec", { time: sfn.WaitTime.duration(cdk.Duration.seconds(10)), }); // Task for 10-second wait const waitDescribeHttpd10Sec = new sfn.Wait( this, "WaitDescribeHttpd10Sec", { time: sfn.WaitTime.duration(cdk.Duration.seconds(10)), } ); // Task for 60-second wait const waitBackupJobDbCluster60Sec = new sfn.Wait( this, "WaitBackupJobDbCluster60Sec", { time: sfn.WaitTime.duration(cdk.Duration.seconds(30)), } ); // Step Functions choice // Declaring the EC2 instance status check conditional branch const isStoppedEc2InstanceState = new sfn.Choice( this, "IsStoppedEc2InstanceState" ); // In case of Pending or InProgress, wait 30 seconds. isStoppedEc2InstanceState.otherwise(waitStopEc2Instance30Sec); // Declare a conditional branch of the httpd stop command execution state const isCompletedStopHttpdState = new sfn.Choice( this, "IsCompletedStopHttpdState" ); // If httpd fails to stop, it will exit abnormally. isCompletedStopHttpdState.otherwise(new sfn.Fail(this, "FailedStopHttpd")); // Declare httpd status check conditional branch const isStoppedHttpdState = new sfn.Choice(this, "IsStoppedHttpdState"); // If httpd fails to start, it will exit abnormally. isStoppedHttpdState.otherwise(new sfn.Fail(this, "FailedDescribeHttpd")); // Declaring the DB Cluster status check conditional branch const describeStatusDbClusterAfterStartBackupJobState = new sfn.Choice( this, "DescribeStatusDbClusterAfterStartBackupJobState" ); // If httpd fails to stop, it will exit abnormally. describeStatusDbClusterAfterStartBackupJobState.otherwise( new sfn.Fail(this, "FailedStopDbCluster") ); // Declaring the DB Cluster status check conditional branch const isAvailableDbClusterState = new sfn.Choice( this, "IsAvailableDbClusterState" ); // If httpd fails to stop, it will exit abnormally. isAvailableDbClusterState.otherwise( new sfn.Fail(this, "FailedStartBackupJobDbCluster") ); // Declarations for concurrently executing operations on EC2 instances. const stopEc2InstancesMapState = new sfn.Map( this, "StopEc2InstancesMapState", { maxConcurrency: 2, itemsPath: sfn.JsonPath.stringAt("$.Input.TargetEc2Instances"), } ); // If httpd is stopped, complete the process. isStoppedEc2InstanceState.when( sfn.Condition.numberEquals( "$.Output.Payload.Reservations[0].Instances[0].State.Code", 80 ), startBackupJobEC2instanceState.next(new sfn.Pass(this, "Passed")) ); // If the DB cluster is up and running, proceed to the next step. isAvailableDbClusterState.when( sfn.Condition.stringEquals( "$.Output.Payload.DBClusters[0].Status", "available" ), stopDbClusterState .next(waitStopDbCluster60Sec) .next(describeStatusDbClusterState) .next(describeStatusDbClusterAfterStartBackupJobState) ); // In case of Pending or InProgress, wait 60 seconds. isAvailableDbClusterState.when( sfn.Condition.or( sfn.Condition.stringEquals( "$.Output.Payload.DBClusters[0].Status", "backing-up" ), sfn.Condition.stringEquals( "$.Output.Payload.DBClusters[0].Status", "backtracking" ), sfn.Condition.stringEquals( "$.Output.Payload.DBClusters[0].Status", "maintenance" ) ), waitBackupJobDbCluster60Sec ); // If httpd is stopped, complete the process. describeStatusDbClusterAfterStartBackupJobState.when( sfn.Condition.stringEquals( "$.Output.Payload.DBClusters[0].Status", "stopped" ), new sfn.Succeed(this, "Succeed") ); // If the DB cluster is stopped, wait 60 seconds. describeStatusDbClusterAfterStartBackupJobState.when( sfn.Condition.stringEquals( "$.Output.Payload.DBClusters[0].Status", "stopping" ), waitStopDbCluster60Sec ); // After creating a maintenance page in ALB, stop the EC2 instance. createAlbRuleState.next(stopEc2InstancesMapState); // After stopping the http.service, stop the EC2 instance. // Then, use AWS Backup to create an AMI for the EC2 instance. // Then, create a snapshot of the DB Cluster in AWS Backup. // After the snapshot creation is complete, stop the DB Cluster. stopEc2InstancesMapState .iterator( stopHttpdState .next(waitStopHttpd10Sec) .next(stopHttpdResultState) .next(isCompletedStopHttpdState) ) .next( new sfn.Pass(this, "RefleshInput", { inputPath: "$", parameters: { "Output.$": "$", }, outputPath: "$", }) ) .next(startBackupJobDbClusterState) .next(waitBackupJobDbCluster60Sec) .next(isCompletedBackupDbClusterState) .next(isAvailableDbClusterState); // If httpd is stopped, complete the process. isCompletedStopHttpdState.when( sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "Success" ), describeHttpdState .next(waitDescribeHttpd10Sec) .next(describeHttpdResultState) .next(isStoppedHttpdState) ); // In case of Pending or InProgress, wait 10 seconds. isCompletedStopHttpdState.when( sfn.Condition.or( sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "Pending" ), sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "InProgress" ) ), waitStopHttpd10Sec ); // If httpd is stopped, complete the process. isStoppedHttpdState.when( sfn.Condition.numberEquals( "$.Output.Payload.CommandInvocations[0].CommandPlugins[0].ResponseCode", 3 ), stopEc2InstanceState .next(waitStopEc2Instance30Sec) .next(describeStatusEc2InstanceState) .next(isStoppedEc2InstanceState) ); // In case of Pending or InProgress, wait 10 seconds. isStoppedHttpdState.when( sfn.Condition.or( sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "Pending" ), sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "InProgress" ) ), waitDescribeHttpd10Sec ); const systemStopStateMachine = new sfn.StateMachine( this, "SystemStopStateMachine", { definition: createAlbRuleState, logs: { destination: webAppStopStateMachineLogGroup, level: sfn.LogLevel.ALL, }, } ); // EventBridge Rule for notifying Slack of the execution results of Step Functions. const eventsRule = new events.Rule(this, "put event rule", { description: "Rules for notifying Slack of the execution results of Step Functions", eventPattern: { source: ["aws.states"], detailType: ["Step Functions Execution Status Change"], detail: { status: ["SUCCEEDED", "FAILED"], stateMachineArn: [systemStopStateMachine.stateMachineArn], }, }, }); // Set StepFunctions as the target of EventBridge Rule eventsRule.addTarget(new targets.LambdaFunction(props.noticeSlackFunction)); } }
Webシステム起動のステートマシンを定義したスタック: WebAppStartStateMachineStack
import * as cdk from "@aws-cdk/core"; import * as logs from "@aws-cdk/aws-logs"; import * as sfn from "@aws-cdk/aws-stepfunctions"; import * as tasks from "@aws-cdk/aws-stepfunctions-tasks"; import * as nodejs from "@aws-cdk/aws-lambda-nodejs"; import * as events from "@aws-cdk/aws-events"; import * as targets from "@aws-cdk/aws-events-targets"; interface WebAppStartStateMachineProps extends cdk.StackProps { startEc2InstanceFunction: nodejs.NodejsFunction; describeStatusEc2InstanceFunction: nodejs.NodejsFunction; runCommandEc2InstanceFunction: nodejs.NodejsFunction; describeResultRunCommandFunction: nodejs.NodejsFunction; startDbClusterFunction: nodejs.NodejsFunction; describeDbClusterFunction: nodejs.NodejsFunction; deleteAlbRuleFunction: nodejs.NodejsFunction; describeHealthTargetFunction: nodejs.NodejsFunction; noticeSlackFunction: nodejs.NodejsFunction; } export class WebAppStartStateMachineStack extends cdk.Stack { constructor( scope: cdk.Construct, id: string, props: WebAppStartStateMachineProps ) { super(scope, id, props); // Get the string after the stack name in the stack id to append to the end of the Log Group name to make it unique. const stackId = new cdk.ScopedAws(this).stackId; const stackIdAfterStackName = cdk.Fn.select(2, cdk.Fn.split("/", stackId)); // Create CloudWatch Logs for Step Functions const webAppStartStateMachineLogGroup = new logs.LogGroup( this, "WebAppStartStateMachineLogGroup", { logGroupName: `/aws/vendedlogs/states/webAppStartStateMachineLogGroup-${stackIdAfterStackName}`, retention: logs.RetentionDays.ONE_WEEK, } ); // Step Functions Task // Task for starting EC2 instances const startEc2InstanceState = new tasks.LambdaInvoke( this, "StartEc2InstanceState", { inputPath: "$.InstanceId", resultPath: "$.Output", lambdaFunction: props.startEc2InstanceFunction, } ); // Task for checking the status of an EC2 instance const describeStatusEc2InstanceState = new tasks.LambdaInvoke( this, "DescribeStatusEc2InstanceState", { inputPath: "$.InstanceId", resultPath: "$.Output", lambdaFunction: props.describeStatusEc2InstanceFunction, } ); // Task to run a RunCommand for "start httpd" const startHttpdState = new tasks.LambdaInvoke(this, "StartHttpdState", { inputPath: "$.StartHttpd", resultPath: "$.Output", lambdaFunction: props.runCommandEc2InstanceFunction, }); // Task to check the status of RunCommand for "start httpd" const startHttpdResultState = new tasks.LambdaInvoke( this, "StartHttpdResultState", { inputPath: "$.Output.Payload", resultPath: "$.Output", lambdaFunction: props.describeResultRunCommandFunction, } ); // Task to run a RunCommand for "status httpd" const describeHttpdState = new tasks.LambdaInvoke( this, "DescribeHttpdState", { inputPath: "$.CheckHttpdStatus", resultPath: "$.Output", lambdaFunction: props.runCommandEc2InstanceFunction, } ); // Task to check the status of RunCommand for "status httpd". const describeHttpdResultState = new tasks.LambdaInvoke( this, "DescribeHttpdResultState", { inputPath: "$.Output.Payload", resultPath: "$.Output", lambdaFunction: props.describeResultRunCommandFunction, } ); // Task for starting DB Cluster const startDbClusterState = new tasks.LambdaInvoke( this, "StartDbClusterState", { inputPath: "$.Input.TargetDbCluster", resultPath: "$.Output", lambdaFunction: props.startDbClusterFunction, } ); // Task for checking the status of an DB Cluster const describeStatusDbClusterState = new tasks.LambdaInvoke( this, "DescribeStatusDbClusterState", { inputPath: "$.Input.TargetDbCluster", resultPath: "$.Output", lambdaFunction: props.describeDbClusterFunction, } ); // Task for checking the status of an DB Cluster const describeDbClusterState = new tasks.LambdaInvoke( this, "DescribeDbClusterState", { inputPath: "$.Input.TargetDbCluster", resultPath: "$.Output", lambdaFunction: props.describeDbClusterFunction, } ); // Task for delete ALB rules const deleteAlbRuleState = new tasks.LambdaInvoke( this, "DeleteAlbRuleState", { inputPath: "$$.Execution.Input.Input.TargetAlb.TargetRule", resultPath: "$.Output", lambdaFunction: props.deleteAlbRuleFunction, } ); // Task to check the health status of the ALB target. const describeHealthTargetState = new tasks.LambdaInvoke( this, "DescribeHealthTargetState", { inputPath: "$$.Execution.Input.Input.TargetAlb.TargetGroup", resultPath: "$.Output", lambdaFunction: props.describeHealthTargetFunction, } ); // Task for 30-second wait const waitStartEc2Instance30Sec = new sfn.Wait( this, "WaitEc2Instance30Sec", { time: sfn.WaitTime.duration(cdk.Duration.seconds(30)), } ); // Task for 30-second wait const waitStartDbCluster30Sec = new sfn.Wait( this, "WaitStartDbCluster30Sec", { time: sfn.WaitTime.duration(cdk.Duration.seconds(30)), } ); // Task for 10-second wait const waitStartHttpd10Sec = new sfn.Wait(this, "WaitStartHttpd10Sec", { time: sfn.WaitTime.duration(cdk.Duration.seconds(10)), }); // Task for 10-second wait const waitDescribeHttpd10Sec = new sfn.Wait( this, "WaitDescribeHttpd10Sec", { time: sfn.WaitTime.duration(cdk.Duration.seconds(10)), } ); // Task for 10-second wait const waitDescribeHealthTarget10Sec = new sfn.Wait( this, "WaitDescribeHealthTarget10Sec", { time: sfn.WaitTime.duration(cdk.Duration.seconds(10)), } ); // Step Functions choice // Declaring the EC2 instance status check conditional branch const isRunningEc2InstanceState = new sfn.Choice( this, "IsRunningEc2InstanceState" ); // If the EC2 instance is not running, wait 30 seconds. isRunningEc2InstanceState.otherwise(waitStartEc2Instance30Sec); // Declare a conditional branch of the httpd startup command execution state const isCompletedStartHttpdState = new sfn.Choice( this, "IsCompletedStartHttpdState" ); // If httpd fails to start, it will exit abnormally. isCompletedStartHttpdState.otherwise( new sfn.Fail(this, "FailedStartHttpd") ); // Declare httpd status check conditional branch const isRunningHttpdState = new sfn.Choice(this, "IsRunningHttpdState"); // If httpd fails to start, it will exit abnormally. isRunningHttpdState.otherwise(new sfn.Fail(this, "FailedDescribeHttpd")); // Declaring the DB Cluster status check conditional branch const IsStoppedDbClusterState = new sfn.Choice( this, "ChoiceIsStoppedDbClusterState" ); // If the DB cluster is not stopped, it will exit normally. IsStoppedDbClusterState.otherwise( new sfn.Fail(this, "CannotStartDbCluster") ); // Declaring the DB Cluster status check conditional branch const isAvailableDbClusterState = new sfn.Choice( this, "IsAvailableDbClusterState" ); // If the DB Cluster is not running, wait 30 seconds. isAvailableDbClusterState.otherwise(waitStartDbCluster30Sec); // Declaring the DB Cluster status check conditional branch const choiceLoopDescibeHealthTargetState = new sfn.Choice( this, "ChoiceLoopDescibeHealthTargetState" ); // If the DB cluster is not stopped, it will exit normally. choiceLoopDescibeHealthTargetState.otherwise(waitDescribeHealthTarget10Sec); // Complete the process when the EC2 instance is running isRunningEc2InstanceState.when( sfn.Condition.and( sfn.Condition.numberEquals( "$.Output.Payload.InstanceStatuses[0].InstanceState.Code", 16 ), sfn.Condition.stringEquals( "$.Output.Payload.InstanceStatuses[0].InstanceStatus.Status", "ok" ), sfn.Condition.stringEquals( "$.Output.Payload.InstanceStatuses[0].SystemStatus.Status", "ok" ) ), startHttpdState .next(waitStartHttpd10Sec) .next(startHttpdResultState) .next(isCompletedStartHttpdState) ); // If httpd is running, complete the process. isCompletedStartHttpdState.when( sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "Success" ), describeHttpdState .next(waitDescribeHttpd10Sec) .next(describeHttpdResultState) .next(isRunningHttpdState) ); // In case of Pending or InProgress, wait 10 seconds. isCompletedStartHttpdState.when( sfn.Condition.or( sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "Pending" ), sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "InProgress" ) ), waitStartHttpd10Sec ); // If httpd is running, complete the process. isRunningHttpdState.when( sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "Success" ), new sfn.Pass(this, "Passed") ); // In case of Pending or InProgress, wait 10 seconds. isRunningHttpdState.when( sfn.Condition.or( sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "Pending" ), sfn.Condition.stringEquals( "$.Output.Payload.CommandInvocations[0].Status", "InProgress" ) ), waitDescribeHttpd10Sec ); // Declarations for concurrently executing operations on EC2 instances. const startEc2InstancesMapState = new sfn.Map( this, "StartEc2InstancesMapState", { maxConcurrency: 2, itemsPath: sfn.JsonPath.stringAt("$.Input.TargetEc2Instances"), } ); startEc2InstancesMapState .iterator( startEc2InstanceState .next(waitStartEc2Instance30Sec) .next(describeStatusEc2InstanceState) .next(isRunningEc2InstanceState) ) .next( new sfn.Pass(this, "RefleshInput", { inputPath: "$", parameters: { "Output.$": "$", }, outputPath: "$", }) ) .next(waitDescribeHealthTarget10Sec) .next(describeHealthTargetState) .next(choiceLoopDescibeHealthTargetState); // Complete the process when the DB Cluster is running isAvailableDbClusterState.when( sfn.Condition.stringEquals( "$.Output.Payload.DBClusters[0].Status", "available" ), startEc2InstancesMapState ); describeDbClusterState.next(IsStoppedDbClusterState); // Complete the process when the DB Cluster is running IsStoppedDbClusterState.when( sfn.Condition.stringEquals( "$.Output.Payload.DBClusters[0].Status", "stopped" ), // Start DB Cluster startDbClusterState .next(waitStartDbCluster30Sec) .next(describeStatusDbClusterState) .next(isAvailableDbClusterState) ); // Complete the process when the DB Cluster is running IsStoppedDbClusterState.when( sfn.Condition.or( sfn.Condition.stringEquals( "$.Output.Payload.DBClusters[0].Status", "available" ), sfn.Condition.stringEquals( "$.Output.Payload.DBClusters[0].Status", "backing-up" ) ), // Start EC2 startEc2InstancesMapState ); // Complete the process when the DB Cluster is running choiceLoopDescibeHealthTargetState.when( sfn.Condition.stringEquals( "$.Output.Payload.TargetHealthDescriptions[0].TargetHealth.State", "healthy" ), deleteAlbRuleState.next(new sfn.Succeed(this, "Succeed")) ); const systemStartStateMachine = new sfn.StateMachine( this, "SystemStartStateMachine", { definition: describeDbClusterState, logs: { destination: webAppStartStateMachineLogGroup, level: sfn.LogLevel.ALL, }, } ); // EventBridge Rule for notifying Slack of the execution results of Step Functions. const eventsRule = new events.Rule(this, "put event rule", { description: "Rules for notifying Slack of the execution results of Step Functions", eventPattern: { source: ["aws.states"], detailType: ["Step Functions Execution Status Change"], detail: { status: ["SUCCEEDED", "FAILED"], stateMachineArn: [systemStartStateMachine.stateMachineArn], }, }, }); // Set StepFunctions as the target of EventBridge Rule eventsRule.addTarget(new targets.LambdaFunction(props.noticeSlackFunction)); } }
複数スタックが存在している場合も、cdk deploy --all
を使用することで、cdk deploy
トランスコンパイルする際はログにも出力されているように、ECR Publicからコンテナをpullしてコンテナ上で行われています。
- ジョブのスケジューリング
- 定期実行
- イベント実行
- ジョブの状態監視
- ジョブの実行結果のログ保存
今回、定期実行以外の機能については全て確認できました。また、確認しなかった定期実行については、以下記事でStep Fuctionsを定期実行できることを紹介しています。
このことから、Step FunctionsとSSM RunCommandを組み合わせた構成はジョブ管理システムを代替できる
にも記載した、AWS Step FunctionsとSSM RunCommand構成の難しいと思ったポイントの以下の様なところに苦労し、作成にはかなり時間がかかりました。
- ステートマシンを設計する際は、ジョブ管理システムのようにGUIで部品を並べて設計することはできず、jsonもしくはyamlといったコードで記述する必要がある
- タスク(ジョブ管理システムで言うところのジョブ)が失敗した際、途中から再実行をすることはできないため、冪等性のある構成にすることが重要
- タスクに渡すパラメーターは事前にjsonで定義し、ステートマシン内でタスク間がどの様なパラメーターを受け渡しするのか把握する必要がある
以上、AWS事業本部 コンサルティング部の のんピ(@non____97)でした!