I tried the new "Zone-Aware Routing" feature of ECS Service Connect for prioritizing same-AZ service-to-service communication
This page has been translated by machine translation. View original
Introduction
On July 1, 2026, Zone-Aware routing functionality was added to ECS Service Connect.
This feature reduces cross-AZ data transfer costs and latency by preferentially routing requests to endpoints within the same AZ for communication between ECS services in a multi-AZ configuration.
Traditional Service Connect distributed traffic evenly across all AZs using round-robin, which inevitably resulted in cross-AZ communication in multi-AZ configurations. While multi-AZ configurations themselves are necessary for availability, data transfer between AZs incurs additional charges and increases latency.
With this update, same-AZ priority routing is enabled by default.
| Item | Traditional Service Connect | After Zone-Aware Support |
|---|---|---|
| Routing method | Round-robin (even distribution across all AZs) | Same-AZ priority |
| Cross-AZ data transfer | Inevitably occurs in multi-AZ configurations | Reduced due to priority routing within the same AZ |
| Availability | Automatic failover on AZ failure | Same (automatic redistribution to cross-AZ when capacity is insufficient) |
| Configuration change | — | Not required (enabled by default) |
| Application to existing services | — | Enabled with one redeployment |
| Additional charges | — | None |
How It Works
Zone-Aware routing uses the zone-aware routing feature of the Service Connect proxy (Envoy sidecar).
The flow of operation is as follows.
- Endpoint discovery: The proxy identifies all endpoints of the destination service and their AZ placement
- Same-AZ priority: Requests are preferentially routed to endpoints in the same AZ as the request source
- Residual capacity routing: Traffic that cannot be accommodated within the same AZ is distributed based on the residual capacity of other AZs
- Fallback: If endpoints in the same AZ are unhealthy or insufficient, routing automatically switches to another AZ
Threshold Conditions for Activation
For Zone-Aware routing to be enabled, the number of endpoints for the destination service must be 2 × number of AZs or more.
- 2-AZ configuration: Minimum 4 endpoints
- 3-AZ configuration: Minimum 6 endpoints
If this threshold is not met, it falls back to normal load balancing without AZ consideration, and is automatically re-enabled when the number of endpoints increases. This is a mechanism to prevent overloading a single AZ.
Comparison with Similar Concepts
The approach of prioritizing the same AZ to reduce cross-AZ transfers also exists in other AWS services.
| Mechanism | Target communication | How same-AZ priority is achieved |
|---|---|---|
| Regional NAT Gateway | Outbound communication | AZ affinity via workload detection (dynamic) |
| Service Connect Zone-Aware | Communication between ECS services | Envoy proxy weighting (dynamic) |
| ALB cross-zone disabled | Client → target | ALB node distribution settings |
Verification
We confirmed the behavior of Zone-Aware routing on an actual ECS cluster with a 2-AZ configuration.
Verification Configuration
ECS cluster (Service Connect namespace: "test-sc-zone-aware")
├── Service A (client role, 2 tasks)
│ ├── AZ-a: 1 task (ECS Exec enabled)
│ └── AZ-c: 1 task (ECS Exec enabled)
└── Service B (server role, 4 tasks)
├── AZ-a: 2 tasks
└── AZ-c: 2 tasks
Service B is a simple HTTP server that includes its own task's AZ information in the response. Requests are sent from Service A via Service Connect, and the AZ information in the response is used to verify which AZ's task the request was routed to. Service B has 4 tasks to meet the threshold condition (2 × number of AZs).
Service B Application
from flask import Flask, jsonify
import requests, os
app = Flask(__name__)
@app.route("/")
def az():
meta_uri = os.environ.get("ECS_CONTAINER_METADATA_URI_V4", "")
if meta_uri:
task_meta = requests.get(meta_uri + "/task", timeout=2).json()
return jsonify({"az": task_meta.get("AvailabilityZone", "unknown")})
return jsonify({"az": "no-metadata-uri"})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8080)
Service Connect Configuration
Specify name and appProtocol in the port mapping in Service B's task definition, and configure the server-side Service Connect settings when creating the service.
{
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp",
"name": "http",
"appProtocol": "http"
}
]
}
Service Connect configuration when creating the service:
{
"enabled": true,
"namespace": "test-sc-zone-aware",
"services": [
{
"portName": "http",
"discoveryName": "service-b",
"clientAliases": [
{
"port": 8080,
"dnsName": "service-b"
}
]
}
]
}
Service A (client side) is configured in client mode only:
{
"enabled": true,
"namespace": "test-sc-zone-aware"
}
No additional configuration is required for Zone-Aware routing. It is enabled by default.
Verification 1: Confirming Zone-Aware Routing Behavior
We connected to tasks in each AZ of Service A via ECS Exec and sent 20 requests to service-b via Service Connect.
# Request from AZ-a task
aws ecs execute-command --cluster test-sc-zone-aware --task <task-id> \
--container app --interactive \
--command 'sh -c "for i in $(seq 1 20); do curl -s http://service-b:8080/; echo; done"'
Executed from the AZ-a (ap-northeast-1a) task:
{"az":"ap-northeast-1a"}
{"az":"ap-northeast-1a"}
{"az":"ap-northeast-1a"}
...(all the same below)
20/20 (100%) were routed to endpoints in the same AZ (ap-northeast-1a).
Executed from the AZ-c (ap-northeast-1c) task:
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
...(all the same below)
Here too, 20/20 (100%) were routed to the same AZ (ap-northeast-1c). It was confirmed that when endpoints are evenly distributed, 100% are processed within the same AZ.
Verification 2: Confirming Fallback Behavior
We manually stopped both tasks on the AZ-a side of Service B and confirmed the behavior when requests were made from the AZ-a client.
# Stop service-b tasks in AZ-a
aws ecs stop-task --cluster test-sc-zone-aware --task <az-a-task-id-1> --reason "test fallback"
aws ecs stop-task --cluster test-sc-zone-aware --task <az-a-task-id-2> --reason "test fallback"
Requests from Service A task in AZ-a (executed after waiting 20 seconds after stopping):
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
10/10 (100%) were routed to AZ-c. With the endpoints on the AZ-a side absent, all requests were sent to the healthy endpoints in AZ-c.
Note that ECS launches new tasks to maintain the desired count. When healthy endpoints are once again present on the same AZ side and the conditions for Zone-Aware routing are met, routing reverts to same-AZ priority (fallback behavior described in the documentation).
Supplementary Note: Behavior Below the Threshold
For reference, the results of the same test conducted when Service B had 2 tasks (one in each AZ) are shown below.
{"az":"ap-northeast-1a"}
{"az":"ap-northeast-1a"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1c"}
{"az":"ap-northeast-1a"}
{"az":"ap-northeast-1c"}
...(14 executions)
| AZ | Count | Ratio |
|---|---|---|
| ap-northeast-1a | 8 | 57% |
| ap-northeast-1c | 6 | 43% |
Traffic was distributed nearly evenly between AZ-a and AZ-c, confirming that same-AZ priority behavior was not in effect. As described in the documentation, when the number of endpoints is below the threshold (2 × number of AZs = 4), the conditions for Zone-Aware routing are not met, and it falls back to normal load balancing without AZ consideration.
Notes
A supplementary note on monitoring in Fargate environments.
The documentation describes how to check the status of Zone-Aware routing using Envoy statistics. The metrics used for verification are lb_zone_routing_cross_zone and lb_zone_cluster_too_small. However, this procedure assumes connecting to an EC2 instance via SSM Session Manager and executing Docker exec.
In Fargate environments, direct access to the Service Connect agent container is not possible, so Envoy statistics could not be verified. As a means of checking cross-AZ communication trends in Fargate, using VPC Flow Logs (with the az-id field) is an option.
Summary
With Zone-Aware routing in ECS Service Connect, endpoints within the same AZ are now prioritized for inter-service communication in multi-AZ configurations. No additional configuration is required, and it is enabled by default for Service Connect services that meet the conditions.
In this verification configuration, we confirmed that 100% of traffic was routed to the same AZ when endpoints were evenly distributed across each AZ. Even when endpoints in the same AZ became unavailable, automatic fallback to healthy endpoints in another AZ was confirmed.
In environments using ECS Service Connect with multi-AZ, reductions in cross-AZ communication and decreases in data transfer costs and latency can be expected.
