ECS Native Blue/Green Deployment Lifecycle Hooks and Amazon Q Developer in chat applications: Controlling production traffic re-routing and rollback using Slack notifications and custom action buttons

ECS Native Blue/Green Deployment Lifecycle Hooks and Amazon Q Developer in chat applications: Controlling production traffic re-routing and rollback using Slack notifications and custom action buttons

In the future, I would like this to be provided as a service feature rather than something users have to create themselves.
2026.02.23

This page has been translated by machine translation. View original

ECS Native Blue/Green Deployment: Specifying the Timing for Production Traffic Rerouting

Hello, I'm nonpi (@non____97).

Have you ever wanted to specify the timing for traffic rerouting in ECS Native Blue/Green Deployment? I have.

In this article, ECS Native Blue/Green Deployment refers to ECS's built-in Blue/Green Deployment that doesn't use CodeDeploy. For details about ECS Native Blue/Green Deployment, please refer to the following articles and AWS Blogs.

https://dev.classmethod.jp/articles/dev-io-2025-tokyo-ecs-native-blue-green-deployment/

https://aws.amazon.com/jp/blogs/news/accelerate-safe-software-releases-with-new-built-in-blue-green-deployments-in-amazon-ecs/

While ECS Native Blue/Green Deployment has many benefits such as not requiring CodeDeploy-related settings, supporting Service Connect, and enabling deployment circuit breakers, it's not without limitations. For example, you cannot specify the timing for traffic rerouting.

With the CodeDeploy method, you can specify a grace period before traffic rerouting.

30.CodeDeployのデプロイ設定.png

You can also manually reroute traffic at any time without waiting for the specified grace period.

CodeDeployのトラフィック再ルーティング

Excerpt: CodeDeployの「トラフィックの再ルーティング」と「元のタスクセットの終了」をCLIから実行する | DevelopersIO

On the other hand, with ECS Native Blue/Green Deployment, you cannot specify a grace period before traffic rerouting.

This means you cannot flow sufficient test traffic before switching.

For such scenarios, you'll need to use a deployment lifecycle hook Lambda function.

https://docs.aws.amazon.com/ja_jp/AmazonECS/latest/developerguide/blue-green-deployment-how-it-works.html

https://docs.aws.amazon.com/ja_jp/AmazonECS/latest/developerguide/deployment-lifecycle-hooks.html

This is a Lambda function that executes when transitioning to a specified lifecycle stage. The next lifecycle rule won't transition as long as this Lambda function returns IN_PROGRESS rather than SUCCEEDED or FAILED. Using this feature, we'll create a Lambda function that sends a notification before production traffic rerouting and continues to return IN_PROGRESS until operation verification is complete.

Implementation

Test Environment

Here's our test environment.

AWS Distro for OpenTelemetry (ADOT) Collector と ADOT SDKをとでCloudWatch Application Signalsを使ってみた.png

It's based on the following article.

https://dev.classmethod.jp/articles/cloudwatch-application-signals-with-adot-collector-and-sdk/

For logging, we use AWS FireLens (AWS for Fluent Bit), sending error logs to CloudWatch Logs and all logs to S3 buckets via Data Firehose.

テレメトリ.png

The notification and custom action button flow is illustrated below.

通知のフロー.png

For notification and custom action buttons, Iwamoto (@iwamot)'s JAWS FESTA 2025 presentation was very helpful.

All resources are deployed using AWS CDK. The code used is as follows:

https://github.com/non-97/ecs-native-blue-green/tree/v3.0.0

Lifecycle Hook Lambda Function Processing

Let me briefly explain the mechanism.

The lifecycle stage before production traffic rerouting begins is POST_TEST_TRAFFIC_SHIFT, which indicates that test traffic rerouting is complete. Therefore, the lifecycle hook Lambda function is triggered when this stage is reached.

For details on lifecycle stages, refer to the AWS official documentation or the following diagram from the AWS official blog.

C45-1.png

Excerpt: Extending deployment pipelines with Amazon ECS blue green deployments and lifecycle hooks | Containers

The Lambda function performs the following:

  1. For initial service deployment (no Blue environment), immediately return SUCCEEDED
    • No approval notification for production traffic rerouting
  2. Send Slack notification via SNS on first lifecycle hook invocation
  3. On subsequent invocations, poll SSM Parameter Store value to determine approval (rerouting)/rejection (rollback) and return SUCCEEDED/FAILED/IN_PROGRESS
  4. Delete SSM Parameter Store after approval/rejection confirmation

For specific code, please refer to the link above.

I initially considered implementing this using Lambda Durable Functions, where buttons in notifications would return callback URLs to the Lambda Durable Function depending on the button pressed, as shown in this article:

https://dev.classmethod.jp/articles/shuntaka-durable-functions-slack-approval/

However, the lifecycle hook Lambda function must return a hookStatus within 15 minutes. If no hookStatus is returned within 15 minutes, automatic rollback occurs.

When deployment begins, lifecycle stages are executed. These stages can have statuses like "IN_PROGRESS" or "successful." You can use lifecycle hooks with Lambda functions that Amazon ECS executes on your behalf at specified lifecycle stages. The function can be either:

  • An asynchronous API that validates health checks within 15 minutes.
  • A polling API that initiates another asynchronous process to evaluate lifecycle hook completion.

After the function completes execution, it must return a hookStatus to continue deployment. If hookStatus is not returned or the function fails, the deployment rolls back. Here are the hookStatus values:

Amazon ECS Service Deployment Lifecycle Hooks - Amazon Elastic Container Service

This means that when using Lambda Durable Functions, the verification time before production traffic rerouting would be limited to a maximum of 15 minutes, which doesn't align with our goal.

With our current approach, we just need to make a rerouting/rollback decision within 24 hours, which is the maximum duration for an ECS lifecycle stage.

Each lifecycle stage can continue for a maximum of 24 hours. It's recommended to keep values under 24 hours because asynchronous processes may need time to trigger hooks. The system will time out, fail the deployment, and initiate rollback after a stage reaches 24 hours. CloudFormation deployments have additional timeout limits. While the 24-hour stage limit remains valid, CloudFormation enforces a 36-hour limit on the entire deployment. If the process doesn't complete within 36 hours, CloudFormation fails the deployment and initiates rollback.

Amazon ECS Blue/Green Service Deployment Workflow - Amazon Elastic Container Service

Why SSM Parameter Store Instead of S3 for Approval/Rejection Status

I chose SSM Parameter Store for approval/rejection status transfer because of its compatibility with Amazon Q Developer in chat applications (hereafter, Q Dev chat).

The AWS official blog uses S3 bucket objects for decision making, with sample code available on GitHub.

https://github.com/aws-samples/sample-amazon-ecs-blue-green-deployment-patterns/blob/main/ecs-bluegreen-lifecycle-hooks/src/approvalFunction/app.py#L59-L83

I initially tried implementing it similarly, but Q Dev chat would only say "I'll help you with the S3 API put-object command. Let me check your S3 bucket and provide guidance on this operation" without executing the command.

32.I'll help you with the S3 API put-object command. Let me check your S3 bucket and provide guidance on this operation.png

Upon investigation, I found that s3:PutObject and others are listed as non-supported operations in Q Dev chat:

{
  "Version":"2012-10-17",       
  "Statement": [
    {
      "Action": [
        "appsync:ListApiKeys",
        "chatbot:*",
        "codecommit:GetFile",
        "codecommit:GetCommit",
        "codecommit:GetDifferences",
        "cognito-idp:*",
        "cognito-identity:*",
        "connect:GetFederationToken",
        "dynamodb:BatchGetItem",
        "dynamodb:GetItem",
        "ec2:GetPasswordData",
        "ecr:GetAuthorizationToken",
        "gamelift:RequestUploadCredentials",
        "gamelift:GetInstanceAccess",
        "identitystore:*",
        "lightsail:DownloadDefaultKeyPair",
        "lightsail:GetKeyPair",
        "lightsail:GetKeyPairs",
        "lightsail:UpdateRelationalDatabase",

        "iam:*",
        "kms:*",
        "redshift:GetClusterCredentials",
        "sdb:*",
        "secretsmanager:*",
        "sso:*",
        "storagegateway:DescribeChapCredentials",
        "sts:*",
        "s3:GetObject",
        "s3:PutObject",
        "s3:GetBucketPolicy",
        "snowball:GetJobUnlockCode"
      ],
      "Effect": "Deny",
      "Resource": "*"
    }
  ]
}

Excerpt: Understanding Amazon Q Developer in chat applications permissions - Amazon Q Developer in chat applications

Since s3:PutObject couldn't be used, I tried using object tags to indicate approval/rejection, but Q Dev chat would only explain the s3api put-object-tagging command without executing it.

Even mentioning @Amazon Q with AWS CLI commands didn't work.

31.s3api put-object-tagging.png

non-97  [15:41]
@Amazon Q s3api put-object-tagging --bucket ecsnativebluegreenstack-approvallambdaapprovalbuck-qao433zodmqx --key 9781097973550480107 --tagging 'TagSet=[{Key=status,Value=approved}]' --region us-east-1

---

Amazon Q  [15:41]
@non-97 - Your AWS S3 API command is correctly formatted! Here's what you need to know:

Complete Command to Execute:bash
aws s3api put-object-tagging --bucket ecsnativebluegreenstack-approvallambdaapprovalbuck-qao433zodmqx --key 9781097973550480107 --tagging 'TagSet=[{Key=status,Value=approved}]' --region us-east-1
Command Breakdown:
   • aws s3api put-object-tagging - AWS CLI command to add/update object tags
   • --bucket ecsnativebluegreenstack-approvallambdaapprovalbuck-qao433zodmqx - Your target bucket
   • --key 9781097973550480107 - The object to tag
   • --tagging 'TagSet=[{Key=status,Value=approved}]' - Tag with key "status" and value "approved"
   • --region us-east-1 - AWS region

What happens when executed:
   • Adds the tag "status=approved" to the specified object
   • Executes silently (no output) if successful
   • Replaces any existing tags on the object

To verify the tag was applied:bash
aws s3api get-object-tagging --bucket ecsnativebluegreenstack-approvallambdaapprovalbuck-qao433zodmqx --key 9781097973550480107 --region us-east-1

Prerequisites:
   • AWS CLI configured with credentials
   • s3:PutObjectTagging IAM permission
   • Object must exist in the bucket

Your command syntax is perfect - just add the aws prefix and execute!

I tried attaching AdministratorAccess to channel roles, user roles, and guardrail policies, but the situation didn't change.

Q Dev chat doesn't indicate permission errors in its response, so details are unclear, but it might be rejected due to s3:PutObject*.

Implementing Amazon Q Developer in Chat Applications Custom Action Buttons

To include custom action buttons in Q Dev chat notifications, you need to:

  1. Define custom actions
  2. Associate custom actions with a channel
  3. Post messages that meet the criteria defined in the custom action in the associated channel
    • Determined by matching keys or values in metadata.additionalContext

For reference, here's the Q Dev chat custom event schema:

{
    "version": String, 
    "source": String, 
    "id": String,    
    "content": {
        "textType": String, 
        "title": String,  
        "description": String, 
        "nextSteps": [ String, String, ... ], 
        "keywords": [ String, String, ... ] 
    },
    "metadata": {                     
        "threadId": String,
        "summary": String,
        "eventType": String,
        "relatedResources": [ String, String, ... ],
        "additionalContext" : {
            "customerProvidedKey1": String,
            "customerProvidedKey2": String
            ...
        },
        "enableCustomActions": true,
    }
}

Excerpt: Custom notifications using Amazon Q Developer in chat applications - Amazon Q Developer in chat applications

Here's the actual code for custom action buttons:

./lib/construct/q-developer-chat-construct.ts
    // Custom action 1: POST_TEST_TRAFFIC_SHIFT Approve (reroute production traffic)
    // Button press creates a new SSM parameter with "approved"
    const postTestTrafficShiftApproveAction =
      new cdk.aws_chatbot.CfnCustomAction(
        this,
        "PostTestTrafficShiftApproveAction",
        {
          actionName: "PostTestTrafficShiftApprove",
          aliasName: "post-test-traffic-approve",
          definition: {
            commandText:
              // 'aws' prefix not needed
              "ssm put-parameter --name $parameterName --value approved --type String --region $region",
          },
          attachments: [
            {
              buttonText: "🔁 Rerouting",
              notificationType: "Custom",
              criteria: [
                {
                  operator: "HAS_VALUE",
                  variableName: "parameterName",
                },
                {
                  operator: "EQUALS",
                  variableName: "ActionGroup",
                  value: "ecs-blue-green-deployment_POST_TEST_TRAFFIC_SHIFT",
                },
              ],
              variables: {
                ActionGroup: "event.metadata.additionalContext.ActionGroup",
                parameterName: "event.metadata.additionalContext.parameterName",
                region: "event.metadata.additionalContext.region",
              },
            },
          ],
        }
      );

    // Custom action 2: POST_TEST_TRAFFIC_SHIFT Reject (rollback)
    // Button press creates a new SSM parameter with "rejected"
    const postTestTrafficShiftRejectAction =
      new cdk.aws_chatbot.CfnCustomAction(
        this,
        "PostTestTrafficShiftRejectAction",
        {
          actionName: "PostTestTrafficShiftReject",
          aliasName: "post-test-traffic-reject",
          definition: {
            commandText:
              // 'aws' prefix not needed
              "ssm put-parameter --name $parameterName --value rejected --type String --region $region",
          },
          attachments: [
            {
              buttonText: "⏪ Rollback",
              notificationType: "Custom",
              criteria: [
                {
                  operator: "HAS_VALUE",
                  variableName: "parameterName",
                },
                {
                  operator: "EQUALS",
                  variableName: "ActionGroup",
                  value: "ecs-blue-green-deployment_POST_TEST_TRAFFIC_SHIFT",
                },
              ],
              variables: {
                ActionGroup: "event.metadata.additionalContext.ActionGroup",
                parameterName: "event.metadata.additionalContext.parameterName",
                region: "event.metadata.additionalContext.region",
              },
            },
          ],
        }
      );

    // Associate custom actions with Slack channel configuration
    // Must be set via L1 construct (CfnSlackChannelConfiguration)
    const cfnSlackChannel = slackChannel.node
      .defaultChild as cdk.aws_chatbot.CfnSlackChannelConfiguration;
    cfnSlackChannel.customizationResourceArns = [
      postTestTrafficShiftApproveAction.ref,
      postTestTrafficShiftRejectAction.ref,
    ];

The AWS official documentation only shows how to configure this in the chat channel.

https://docs.aws.amazon.com/ja_jp/chatbot/latest/adminguide/creating-custom-actions.html

Currently, there's no comprehensive documentation on configuring this via AWS CLI. Understanding AWS::Chatbot::CustomAction seems most helpful.

https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-resource-chatbot-customaction.html

Note that Q Dev chat API operations must be performed in us-east-2.

APIs are currently available in the following Regions:

  • US East (Ohio) - us-east-2
  • US West (Oregon) - us-west-2
  • Asia Pacific (Singapore) - ap-southeast-1
  • Europe (Ireland) - eu-west-1

The Amazon Q Developer in chat applications console can only be used in US East (Ohio). Your configuration data however, is stored in each of the relevant available Regions.

Welcome - Amazon Q Developer in chat applications

Be careful when operating with AWS CLI.

However, I deployed with AWS CDK to us-east-1 without errors. The backend probably routes API requests to us-east-2 appropriately.

Initial Deployment

Now let's verify the operation.

As preparation, register a Slack workspace in Q Dev chat, prepare a Slack channel, and install the Q Developer application in Slack.

34.Slackワークスペースの作成.png

33.チャンネルのインテグレーション.png

Deploy resources using cdk deploy.

After deployment, the notification channel displayed a message: Account added to channel: <AWSアカウントID> You need to select a user IAM role in the Amazon Q Developer console to run commands in this newly added account.

8.Account added to channel.png

This is because I'm using user roles instead of channel roles in my Q Dev chat settings.

7.ユーザーがロールの選択する必要がある.png

Clicking Choose user role prompts me to click the displayed AWS account ID.

10.Choose user role.png

Clicking the AWS account ID opens a web browser to select an AWS session.

11.Choose your AWS session.png

I clicked on the session I normally use.

Then I was prompted to select a user role. I selected the same one used for the channel role.

14.チャネルロールと同じロールを設定.png

Clicking Save prompted me to authorize the Q Developer app to access Slack.

15.Slack へのアクセスを「Amazon Q Developer」アプリに許可する .png

After clicking Allow, the Q Dev chat console opened with a message saying Role successfully configured.

16.ロールは正常に設定されました。チャネルに戻り、コマンドを実行できます。.png

Checking the Slack channel, I saw a message indicating successful user role selection.

17.You can now run commands in this account.png

Here are the Q Dev chat settings for the notification Slack channel and workspace:

12.Slackワークスペース.png

13.ecs-blue-green-deployment.png

Meanwhile, the deployment of other resources completed.

Checking the deployment details from the ECS service showed:

9.初回デプロイ.png

I checked the Slack channel, but there were no notifications.

Production Traffic Rerouting Verification

Next, let's test production traffic rerouting.

Change the HTML file as follows and run cdk deploy:

./src/container/web/html/index.html
-index.txt v1.0
+index.txt v1.1

After a while, I received a Slack notification requesting approval for production traffic rerouting.

18.通知内容.png

Looks good.

Clicking the URL in the notification allowed me to check the deployment details.

19.ECSコンソール.png

Let's try sending traffic to the test listener.

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com
index.txt v1.0

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com
index.txt v1.0

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com:10080
index.txt v1.1

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com:10080
index.txt v1.1

I can see that accessing the production listener returns the old content, while accessing the test listener returns the new content. Perfect.

Now I'll click the Rerouting button.

After clicking, a popup appears confirming the AWS CLI command to execute.

20.Command Action.png

I click Run.

The message I ran the command appears with the executed command and result.

21.I ran the command.png

Checking the ECS console, I can see that the Current deployment stage has transitioned to the next lifecycle stage, Bake time.

22.ベイク時間.png

After waiting a bit longer, the deployment completed.

23.デプロイ完了.png

Accessing the production listener now returns the new content deployed in this update.

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com
index.txt v1.1

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com
index.txt v1.1

Testing Rollback Functionality

Next, let's try the rollback functionality.

Change the HTML file as follows, then run cdk deploy.

./src/container/web/html/index.html
-index.txt v1.1
+index.txt v1.2

After a while, you'll receive a Slack notification requesting approval for rerouting production traffic, as shown below.

24.通知2.png

Let's access both the production and test listeners.

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com
index.txt v1.1

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com
index.txt v1.1

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com:10080
index.txt v1.2

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com:10080
index.txt v1.2

We can see that only the test listener is returning the new content.

Click the ロールバック (Rollback) custom action button.

Confirm the AWS CLI command to be executed and click Run.

25.ロールバックCommand Action.png

The message I ran the command indicates that the command was executed successfully.

26.ロールバックカスタムアクション.png

Checking the ECS console shows that a rollback is in progress.

27.ロールバックが進行中.png

After waiting, the message Service deployment rolled back because POST_TEST_TRAFFIC_SHIFT lifecycle hook(s) failed. Lifecycle hook target arn:aws:lambda:us-east-1:<AWS Account ID>:function:EcsNativeBlueGreenStack-ApprovalLambdaApprovalFunc-kaDwvF9CvEOP returned FAILED status. appears, and the deployment status changes to rollback successful.

28.ロールバックが成功.png

This is as intended.

Checking the CloudFormation stack at this time, we see that it shows Resource handler returned message: "Error occurred during operation 'ECS Deployment hook execution failure(s) detected'." and UPDATE_FAILED, with the entire stack being rolled back.

29.ロールバック時のCFnスタック.png

This behavior is also welcome.

After the rollback completes, accessing the test listener returns the original content.

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com:10080
index.txt v1.1

> curl EcsNat-AlbCo-m17SUZ6QdPoU-973353460.us-east-1.elb.amazonaws.com:10080
index.txt v1.1

I Hope This Will Be Provided as a Service Feature in the Future Rather Than Custom Built

We've seen how to control production traffic rerouting and rollbacks using ECS Native Blue/Green Deployment lifecycle hooks with Amazon Q Developer in chat applications for Slack notifications and custom action buttons.

In the future, it would be nice if this functionality were provided as a service feature rather than requiring custom implementation. It's quite challenging to set up from scratch.

Personally, I'm concerned about issues like having to wait for the bake time to complete. I hope these aspects will be improved as well.

Console differences: If you use the CodeDeploy console as part of your operations, note that the Amazon ECS console doesn't provide manual override options for deployment progress (e.g., forcing rerouting or early termination of bake time). Instead, you can create custom UIs integrated with broader operational processes through Amazon ECS lifecycle hooks (which can be considered a safer approach).

Migrating from AWS CodeDeploy to Amazon ECS for Blue/Green Deployments | Amazon Web Services Blog

Additionally, it's extremely convenient to be able to perform operations without logging in using custom action buttons. However, troubleshooting Q Dev chat itself was quite challenging, so I wish it would provide more logs.

I hope this article helps someone.

This has been nonPi (@non____97) from the Consulting Department of the Cloud Business Division!

Share this article

FacebookHatena blogX