[Update] AWS FIS now enables resilience testing for S3 Express One Zone
Introduction
AWS FIS (Fault Injection Service) now supports resilience testing for S3 Express One Zone. S3 Express One Zone is a storage class designed for workloads requiring high-speed access to S3. With this update, you can use AWS FIS to verify your application's behavior during AZ failures in advance.
Update Overview
The AWS FIS aws:network:disrupt-connectivity
action can now block connections to S3 Express One Zone directory buckets. For example, you can now perform the following validations in advance:
- Whether your application's failover mechanism works correctly
- Whether recovery processes function properly
- Whether your monitoring system can detect access failures to S3 Express One Zone
My Use Case
Here's an example of using AWS ParallelCluster with spot instances. When spot interruptions occur, you might implement a mechanism to evacuate locally generated intermediate files. S3 Express One Zone can be used as the evacuation destination. The purpose is to preserve intermediate computation files so that processing can be resumed.
Intermediate files can sometimes consist of many small files, and it's meaningless if you cannot evacuate all files within the 2-minute interruption period. For such cases, S3 Express One Zone directory buckets are effective.
Test Environment
I prepared an environment where an EC2 instance accesses an S3 Express One Zone directory bucket.
### Operation Verification
Verify that file uploads are working correctly with AWS CLI.
# Upload
$ aws s3 cp ./test-file-1mb.dat s3://mountpoint-for-s3--apne1-az4--x-s3/
upload: ./test-file-1mb.dat to s3://mountpoint-for-s3--apne1-az4--x-s3/test-file-1mb.dat
# Check file
$ aws s3 ls s3://mountpoint-for-s3--apne1-az4--x-s3/ --recursive
2025-08-30 01:18:10 1048576 test-file-1mb.dat
Also verify operation with a script using boto3.
#!/usr/bin/env python3
import boto3
import uuid
from datetime import datetime
# Configuration
BUCKET_NAME = 'mountpoint-for-s3--apne1-az4--x-s3'
FILE_PATH = 'test-file-1mb.dat'
# Generate random prefix
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
random_id = str(uuid.uuid4())[:8]
prefix = f"{timestamp}_{random_id}"
# Upload file
s3 = boto3.client('s3', region_name='ap-northeast-1')
s3_key = f"{prefix}/test-file-1mb.dat"
with open(FILE_PATH, 'rb') as f:
s3.put_object(
Bucket=BUCKET_NAME,
Key=s3_key,
Body=f,
StorageClass='EXPRESS_ONEZONE'
)
print(f"Uploaded: s3://{BUCKET_NAME}/{s3_key}")
Both methods successfully upload files.
# Upload
$ ./upload.py
Uploaded: s3://mountpoint-for-s3--apne1-az4--x-s3/20250830_012702_9a514a75/test-file-1mb.dat
# Check file
$ aws s3 ls s3://mountpoint-for-s3--apne1-az4--x-s3/ --recursive
2025-08-30 01:27:02 1048576 20250830_012702_9a514a75/test-file-1mb.dat
2025-08-30 01:18:10 1048576 test-file-1mb.dat
AWS FIS Reference Scenario Template
The AWS FIS console provides a scenario called "AZ Availability: Power Interruption". This scenario reportedly includes S3 Express One Zone failure. Let's check the details.
The S3 Express One Zone failure simulation settings were found in the Pause-netowrk-connectivity
action.
### Official Documentation
According to the documentation, the aws:network:disrupt-connectivity
action interrupts the connection between subnets in an AZ and S3 Express One Zone. This causes data plane API operations to time out.
Disrupt connectivity to S3 Express One Zone directory buckets
During an AZ power interruption, data stored in S3 Express One Zone directory buckets in the AZ is not accessible. AZ Availability: Power Interruption includes aws:network:disrupt-connectivity to disrupt connectivity between subnets and One Zone directory buckets in the affected AZ for the duration of the experiment, resulting in timeouts to Zonal endpoint data plane API operations. Use this action to test disruption when compute is co-located with storage in an AZ. This action targets subnets. By default, it targets subnets with a tag named AzImpairmentPower with a value of DisruptSubnet. You can add this tag to your subnets or replace the default tag with your own tag in the experiment template. By default, if no valid subnets are found this action will be skipped.
Source: AZ Availability: Power Interruption - AWS Fault Injection Service
Creating an FIS Experiment Template
I'm creating an experiment in AWS FIS to disrupt connectivity to S3 Express One Zone.### Setting up an Experiment Template
I will explain the procedure for creating an experiment template.
The configuration items are as follows:
- Action:
aws:network:disrupt-connectivity
- Scope:
S3 Express
- Target: Subnet (detailed settings described later)
We will modify the subnet settings configured as the target.
I selected the subnet ID where the EC2 instance for verification will be launched using resource ID specification.
I created a new IAM role for FIS experiment execution.
The experiment template is now complete.
Experiment Execution and Results
We start the experiment from the created experiment template.
While the experiment is starting, it shows a Running
status.
After the specified 5 minutes passed, it completed.
### Operational Check During the Experiment
During the experiment, we attempt to access S3 Express One Zone.
For AWS CLI
$ aws s3 cp ./test-file-1mb.dat s3://mountpoint-for-s3--apne1-az4--x-s3/
# No result returned
# Interrupted with Ctrl+C
For Python script
$ ./upload.py
# No result returned
# After the experiment was completed, the transfer succeeded
Uploaded: s3://mountpoint-for-s3--apne1-az4--x-s3/20250830_014328_04ba0f27/test-file-1mb.dat
During the experiment, we couldn't connect to S3 Express One Zone and no prompt was returned. After the experiment ended, the connection recovered and the file upload succeeded through retry processing.
Summary
With AWS FIS for S3 Express One Zone resilience testing, we can now verify behavior during AZ failures. This allows testing of monitoring system detection and failover mechanisms. Recovery processes can also be verified before actual failures occur.
Please utilize this to improve the availability of systems that use S3 Express One Zone.
Conclusion
Mountpoint for Amazon S3 has settings to use S3 Express One Zone as a cache. I'd like to check the behavior during AZ failures with these settings. I'll try this next time.