Tried using unsupported ARM64 / FARGATE_SPOT / ECS Exec in ECS Express Mode
This page has been translated by machine translation. View original
ECS Express Mode is a convenient feature that allows you to build environments quickly with minimal configuration, but it does not allow setting up features such as ARM64 (Graviton), Fargate Spot, and ECS Exec at creation time.
Recently, I had the opportunity to enable these features in a development environment that adopted ECS Express Mode to improve costs and debugging efficiency, so I'll introduce the procedure.
Prerequisites
- Production container image built for ARM64 (aarch64) architecture is already registered in ECR
- The initial setup stack for ECS Express Mode (TaskExecutionRole, InfrastructureRole, etc.) has already been deployed
Challenge: Express Mode Limitations
ECS Express Mode (AWS::ECS::ExpressGatewayService) is a high-level component that automatically generates and manages low-level resources such as task definitions, services, ALB, and Auto Scaling internally. Therefore, properties such as runtimePlatform and capacityProviderStrategy that can be specified in normal CloudFormation do not exist in Express Mode templates.
The architecture is fixed to x86_64 and cannot be changed through the update-express-gateway-service API.
To work around this limitation, we implemented a two-stage deployment:
- Dummy Deployment: Create a stack with an x86 dummy image that has the same ports and paths as production (with task count 0 to suppress startup).
- Architecture Conversion: Edit the automatically generated task definition to ARM64 via CLI and apply with
update-service.
Step 1: Initial Stack Creation with CloudFormation
First, we deployed resources normally under Express Mode management using an x86 dummy image.
Verification Code (CloudFormation Template)
- To use ECS Exec, IAM for SSM was set up at the time of initial template placement.
- Specified
MinTaskCount: 0to prevent task startup with the dummy image. SettingMinTaskCount: 1could trigger the circuit breaker and cause a rollback loop if the dummy image fails health checks.
Full Initial Template
AWSTemplateFormatVersion: '2010-09-09'
Description: 'ECS Express Mode - Initial Setup (x86 dummy)'
Parameters:
ServiceName:
Type: String
Default: 'my-app-dev'
InitialStackName:
Type: String
Default: 'ecs-express-initial'
Resources:
ECSLogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: !Sub '/aws/ecs/default/${ServiceName}-service'
RetentionInDays: 14
TaskRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub '${ServiceName}-task-role'
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ecs-tasks.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: ECSExecPolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- ssmmessages:CreateControlChannel
- ssmmessages:CreateDataChannel
- ssmmessages:OpenControlChannel
- ssmmessages:OpenDataChannel
Resource: '*'
ExpressModeService:
Type: AWS::ECS::ExpressGatewayService
DependsOn: ECSLogGroup
Properties:
ServiceName: !Sub '${ServiceName}-service'
Cluster: 'default'
ExecutionRoleArn:
Fn::ImportValue: !Sub '${InitialStackName}-TaskExecutionRoleArn'
InfrastructureRoleArn:
Fn::ImportValue: !Sub '${InitialStackName}-InfrastructureRoleArn'
TaskRoleArn: !GetAtt TaskRole.Arn
PrimaryContainer:
Image: !Sub '${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/my-app:dummy'
ContainerPort: 8000
AwsLogsConfiguration:
LogGroup: !Sub '/aws/ecs/default/${ServiceName}-service'
LogStreamPrefix: 'ecs'
NetworkConfiguration:
Subnets:
Fn::Split:
- ','
- Fn::ImportValue: !Sub '${InitialStackName}-PrivateSubnets'
Cpu: '256'
Memory: '512'
HealthCheckPath: '/health'
ScalingTarget:
MinTaskCount: 0
MaxTaskCount: 1
AutoScalingMetric: 'AVERAGE_CPU'
AutoScalingTargetValue: 70
Outputs:
ServiceEndpoint:
Value: !GetAtt ExpressModeService.Endpoint
After deploying this template, we confirmed that the task definition's runtimePlatform was generated with the default x86_64. No tasks are launched at this point due to MinTaskCount: 0.
Step 1.5: Disable Bake Time
To reduce work time before conversion, we set the deployment bake time to 0 in the management console.
- ECS → Clusters → Service → Deployment Settings → Bake Time: 0 minutes
Step 2: Converting Task Definition to ARM64 and Applying
We retrieved the task definition revision automatically generated by Express Mode, changed the cpuArchitecture to ARM64, replaced the image with the production ARM image, and updated the CPU/memory to production specifications.
SERVICE_NAME="my-app-dev-service"
TD_FAMILY="default-${SERVICE_NAME}"
REGION="ap-northeast-1"
ARM_IMAGE="123456789012.dkr.ecr.$REGION.amazonaws.com/my-app:latest"
# 1. Get current task definition
LATEST=$(aws ecs list-task-definitions --family-prefix $TD_FAMILY --sort DESC \
--region $REGION --query 'taskDefinitionArns[0]' --output text)
aws ecs describe-task-definition --task-definition $LATEST \
--region $REGION --query 'taskDefinition' > /tmp/td.json
# 2. Change to ARM64 + production image (including metadata removal)
jq --arg img "$ARM_IMAGE" '
.runtimePlatform.cpuArchitecture = "ARM64" |
.containerDefinitions[0].image = $img |
.cpu = "512" | .memory = "1024" |
.containerDefinitions[0].cpu = 512 |
.containerDefinitions[0].memoryReservation = 1024 |
del(.taskDefinitionArn, .revision, .status, .requiresAttributes,
.compatibilities, .registeredAt, .registeredBy, .enableFaultInjection)
' /tmp/td.json > /tmp/td_arm.json
# 3. Register new revision
aws ecs register-task-definition \
--cli-input-json file:///tmp/td_arm.json \
--region $REGION
# 4. Update service
NEW_REV=$(aws ecs list-task-definitions --family-prefix $TD_FAMILY --sort DESC \
--region $REGION --query 'taskDefinitionArns[0]' --output text)
aws ecs update-service \
--cluster default \
--service $SERVICE_NAME \
--task-definition $NEW_REV \
--force-new-deployment \
--region $REGION
Behavior Check During update-stack Execution
After switching to ARM64 via CLI, we executed CloudFormation's update-stack and changed the image in the template to the production ARM64 image. This reduces the risk of exec format error if a rollback occurs during subsequent CloudFormation operations, as the template will reference the ARM64 image.
- A new task definition revision was generated by executing
update-stack. - We confirmed that the
cpuArchitecturein theruntimePlatformremained asARM64in the new revision. - We confirmed that the update to the image specified in the template was reflected in the new revision.
We verified that during Express Mode stack updates, the architecture settings are inherited from the currently running task definition.
Step 3: Enabling FARGATE_SPOT and ECS Exec
After converting to ARM64, we further enabled cost optimization and debugging features.
aws ecs update-service \
--cluster default \
--service my-app-dev-service \
--capacity-provider-strategy capacityProvider=FARGATE_SPOT,base=0,weight=1 \
--enable-execute-command \
--force-new-deployment \
--region ap-northeast-1
Verification
After applying the settings, we launched a task to verify each setting.
aws ecs update-service \
--cluster default \
--service my-app-dev-service \
--desired-count 1 \
--region ap-northeast-1
Once the task started, we confirmed it was running on FARGATE_SPOT.
aws ecs describe-tasks \
--cluster default \
--tasks <task-id> \
--query 'tasks[0].capacityProviderName' \
--region ap-northeast-1
# => "FARGATE_SPOT"
We logged into the container using ECS Exec and confirmed it was running on ARM64. Since the container name is automatically set to Main in Express Mode, we specified --container Main.
aws ecs execute-command \
--cluster default \
--task <task-id> \
--container Main \
--interactive \
--command "/bin/sh"
# Executed inside the container
$ uname -m
aarch64
The display of aarch64 confirmed that it was running on ARM64 (Graviton).
Summary
Even with ECS Express Mode, we were able to use ARM64, Fargate Spot, and ECS Exec through a two-stage deployment process. In the initial stack, setting MinTaskCount: 0 prevented dummy image task launch, and changing to the necessary task count after ARM64 conversion proved to be a safe deployment pattern.
However, since this is a procedure not explicitly documented in the official documentation, behavior may change with future updates, and issues such as interference from changes outside IaC or drift may occur. We recommend limiting its use to development and testing environments where you can use it at your own risk.
