I dealt with failed deployment of CloudFormation with 3 AZ configuration (!GetAZs doesn't return all AZs)
This page has been translated by machine translation. View original
When I tried to deploy a VPC with a 3AZ configuration in the Tokyo region (ap-northeast-1) using CloudFormation, the following error occurred:
Template error: Fn::Select cannot select nonexistent value at index 2.
In the template, I was referencing the third AZ (index 2) from the AZ list obtained with !GetAZs.
Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
Subnet3:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.2.0/24
AvailabilityZone: !Select [2, !GetAZs ''] # ← Error here
While the Tokyo region should normally return 3 AZs with !GetAZs, it appeared to be returning only 2. I had the opportunity to investigate the cause of this issue, which I'll share here.
Comparison with EC2 API
To be sure, I checked the available AZs in the Tokyo region using the EC2 API (AWS CLI).
$ aws ec2 describe-availability-zones --region ap-northeast-1 \
--query 'AvailabilityZones[*].ZoneName' --output table
-------------------------------
|DescribeAvailabilityZones |
+---------------------------+
| ap-northeast-1a |
| ap-northeast-1c |
| ap-northeast-1d |
+---------------------------+
Via the API, all 3 AZs (1a, 1c, 1d) were returned without issue.
There was a discrepancy where "3 AZs are visible via EC2 API, but only 2 are obtained with CloudFormation's !GetAZs".
Root Cause
After investigation, I found that the state of default subnets was affecting this behavior. I confirmed with the following command:
$ aws ec2 describe-subnets \
--filters "Name=default-for-az,Values=true" \
--region ap-northeast-1 \
--query 'Subnets[*].[AvailabilityZone,SubnetId]' \
--output table
-----------------------------------------
| DescribeSubnets |
+------------------+-------------------+
| ap-northeast-1a | subnet-xxxxxxxx |
| ap-northeast-1c | subnet-yyyyyyyy |
+------------------+-------------------+
# ap-northeast-1d is missing
As shown in the output, there was no default subnet for ap-northeast-1d.
This account had been in use since before the release of ap-northeast-1d in 2018, and there was a possibility that the subnet had been excluded due to using AMIs that weren't supported in 1d at that time.
Reference: A fourth Availability Zone (ap-northeast-1d) has been added to the Tokyo region! | DevelopersIO
The AWS CloudFormation official documentation states the following about the behavior of Fn::GetAZs:
The Fn::GetAZs function returns only Availability Zones that have a default subnet unless none of the Availability Zones has a default subnet; in that case, all Availability Zones are returned.
In summary, the behavior of !GetAZs is as follows:
- If at least one default subnet exists
→ Returns only AZs that have default subnets (this was our case) - If no default subnets exist
→ Returns all AZs in that region
In our case, "only some AZs (1a, 1c) had default subnets," so 1d was omitted from the list.
Solutions
The AWS account where this problem occurred was a test environment.
Since we might use the default VPC for hands-on exercises or services like EC2 Image Builder, we addressed it by adding a default subnet.
Solution 1: Create a default subnet
Create a default subnet in the missing AZ (in this case, ap-northeast-1d).
aws ec2 create-default-subnet --availability-zone ap-northeast-1d
This creates a state where "all AZs have default subnets," enabling !GetAZs to return all 3 AZs. This is effective if you want to maintain the existing default VPC.
Solution 2: Delete the default VPC
Completely delete the default VPC itself.
aws ec2 delete-vpc --vpc-id vpc-xxxxx
As per the specification mentioned earlier, when "no default subnets exist," all AZs are returned, so this also makes !GetAZs return all 3 AZs.
Conclusion
CloudFormation's !GetAZs doesn't simply return "all AZs in a region"; its behavior changes based on the existence of default VPCs and default subnets. If you can't obtain the AZs shown in the EC2 API or VPC dashboard, consider checking for missing default subnets.
Also, except for development and test environments, consider deleting the default VPC in production environments as it can be a risk factor.
