Performance Measurement of Amazon S3's New Checksum Algorithm on the Latest Generation of Intel/AMD/Graviton4 EC2
This page has been translated by machine translation. View original
On April 23, 2026, Amazon S3 added support for 5 new checksum algorithms, bringing the total to 10.
This article examines how algorithm selection and instance type affect performance through benchmarks on the latest generation of three EC2 architectures (Intel Emerald Rapids / AMD EPYC / AWS Graviton4).
What has changed
| Category | Previously Available | Newly Added |
|---|---|---|
| CRC family | CRC32, CRC32C, CRC64NVME | ― |
| XXHash family | ― | XXHash3, XXHash64, XXHash128 |
| SHA family | SHA-1, SHA-256 | SHA-512 |
| Others | ― | MD5 |
Available in 37 regions. The default remains CRC64NVME.
Algorithm Characteristics
Non-cryptographic Hashes (Fast, for integrity checks)
| Algorithm | Bit Length | Features |
|---|---|---|
| CRC32 / CRC32C | 32bit | Lightweight. Hardware acceleration supported |
| CRC64NVME | 64bit | S3 default. Highly optimized in AWS CRT |
| XXHash64 | 64bit | Fast general-purpose hash. Widely used in data pipelines |
| XXHash3 | 64bit | Latest version of XXHash. SIMD optimized |
| XXHash128 | 128bit | 128bit version of XXHash3. Even lower collision probability |
Cryptographic Hashes (For security and compliance)
| Algorithm | Bit Length | Features |
|---|---|---|
| MD5 | 128bit | Collision attacks proven. For legacy compatibility |
| SHA-1 | 160bit | Collisions proven. Not recommended |
| SHA-256 | 256bit | Currently secure. Widely used |
| SHA-512 | 512bit | Longer hash value than SHA-256. Can be more efficient on 64bit CPUs |
Non-cryptographic hashes are sufficient for detecting data corruption during transfers. For protecting against data tampering on S3, IAM policies and S3 Object Lock are more appropriate than checksums.
Benchmarks
Test Environment
Testing was conducted on three instance types in the Tokyo region (ap-northeast-1).
| Item | c8i.4xlarge | c8a.4xlarge | c8g.4xlarge |
|---|---|---|---|
| CPU | Intel Xeon 6975P-C (Emerald Rapids) | AMD EPYC 9R45 | AWS Graviton4 |
| Architecture | x86_64 | x86_64 | aarch64 |
| vCPU / Physical cores | 16 / 8 (with SMT) | 16 / 16 (no SMT) | 16 / 16 (no SMT) |
| Memory | 32GB DDR5 | 32GB DDR5 | 32GB DDR5 |
| Network bandwidth | Up to 12.5 Gbps | Up to 15 Gbps | Up to 12.5 Gbps |
| On-demand price | $0.4718/h | $0.5426/h | $0.4003/h |
Test data (5GB) was placed on tmpfs (RAM disk) to eliminate storage I/O bottlenecks. Checksum calculation speeds were measured using Python's xxhash (C extension) and hashlib on in-memory data with single-thread processing. S3 transfer speeds were measured using aws s3 cp with AWS CLI 2.34.35. All results are averages of three runs.
Checksum Calculation Speed (5GB, Single-thread, Average of 3 runs)
| Algorithm | c8i (Intel) | c8a (AMD) | c8g (Graviton4) |
|---|---|---|---|
| XXHash3_64 | 14.8 GB/s | 36.7 GB/s | 23.4 GB/s |
| XXHash3_128 | 14.8 GB/s | 36.8 GB/s | 23.3 GB/s |
| XXHash64 | 7.0 GB/s | 22.0 GB/s | 18.3 GB/s |
| SHA-256 | 1.7 GB/s | 2.1 GB/s | 1.7 GB/s |
| CRC32 | 1.3 GB/s | 1.7 GB/s | 1.0 GB/s |
| SHA-512 | 0.8 GB/s | 1.2 GB/s | 1.0 GB/s |
| MD5 | 0.8 GB/s | 0.9 GB/s | 0.6 GB/s |
Variation between runs was extremely small (coefficient of variation under 1%), indicating highly reproducible results.
AMD was fastest across all algorithms. XXHash3 reached 36.7 GB/s, 2.5 times faster than Intel and 1.6 times faster than Graviton4. This is likely due to the AVX-512 optimizations in the xxhash library. Graviton4 also significantly outperformed Intel, achieving 23.4 GB/s with XXHash3 and 18.3 GB/s with XXHash64, demonstrating effective ARM NEON optimizations.
S3 Transfer Speed (5GB, tmpfs → S3, Average of 3 runs)
Upload throughput using aws s3 cp. The 5GB file is processed as a multipart upload with parts transferred in parallel.
| Algorithm | c8i (Intel) | c8a (AMD) | c8g (Graviton4) |
|---|---|---|---|
| CRC32C | 507 MB/s | 494 MB/s | 425 MB/s |
| CRC32 | 470 MB/s | 405 MB/s | 413 MB/s |
| XXHASH3 | 461 MB/s | 411 MB/s | 428 MB/s |
| SHA-256 | 458 MB/s | 382 MB/s | 450 MB/s |
| CRC64NVME | 455 MB/s | 421 MB/s | 421 MB/s |
| XXHASH128 | 432 MB/s | 480 MB/s | 443 MB/s |
| XXHASH64 | 423 MB/s | 434 MB/s | 480 MB/s |
| SHA-1 | 403 MB/s | 469 MB/s | 491 MB/s |
| SHA-512 | 373 MB/s | 355 MB/s | 395 MB/s |
While checksum calculation speeds showed up to a 47x difference between algorithms (XXHash3 vs MD5), S3 transfer speeds across all three environments fell within a much narrower range of 350-510 MB/s, significantly reducing the differences between algorithms.
Analysis
Algorithm choice has limited impact on S3 transfers
Using 4xlarge instances with sufficient network bandwidth and tmpfs to eliminate storage I/O bottlenecks, all three environments achieved throughputs around 400-500 MB/s regardless of algorithm. Even with tens of times difference in checksum calculation speed, the impact on overall S3 transfer throughput was minimal. This is because network I/O and SDK overhead dominate, with checksum calculation representing only a small portion of the overall process.
Small differences between the three environments
There was no clear advantage among the three architectures in S3 transfer speeds. However, there were differences in run-to-run variation (standard deviation), with c8i showing the largest (average SD 79.5 MB/s) and c8a showing the smallest (average SD 41.4 MB/s).
In preliminary tests using .large (2 vCPU) instances, c8i.large (1 physical core with SMT) showed throughput drops to as low as 141 MB/s with some algorithms. This was not observed with c8a.large/c8g.large (2 physical cores) and was resolved when switching to 4xlarge for c8i, suggesting that limited physical cores can bottleneck parallel processing. For .large class instances, EBS throughput and network bandwidth constraints also contribute, making it difficult to identify a single cause. For environments heavily using multipart uploads, choosing instance sizes with sufficient physical cores is important.
Architecture differences are pronounced in checksum calculation speed
For pure checksum calculation speed, AMD (c8a) was significantly faster. This would be beneficial for workloads with high checksum calculation requirements or environments with very high network bandwidth (25 Gbps or higher) for multipart uploads.
Architecture Selection Guidelines
| Priority | Recommendation |
|---|---|
| Checksum calculation speed | c8a (AMD) is fastest |
| S3 transfer speed | All three are comparable (minor differences) |
| Cost performance | c8g (Graviton4) is most economical |
c8g is 15% cheaper than c8i and 26% cheaper than c8a. Given the minimal differences in S3 transfer speeds, c8g (Graviton4) is suitable for cost-conscious workloads, while c8a (AMD) is better for performance-focused needs.
Recommended Algorithms by Use Case
| Use Case | Recommendation | Reason |
|---|---|---|
| General purpose | CRC64NVME (default) | No change needed. Adequate speed and accuracy |
| Integration with existing pipelines | XXHash64 / XXHash3 | Compatibility with Spark, ClickHouse, etc. |
| Large-scale data lakes | XXHash128 | 128bit with extremely low collision risk |
| Compliance requirements | SHA-256 / SHA-512 | FIPS compliance, financial/medical/government sectors |
| Legacy system compatibility | MD5 | For pre-computed value specification in headers only |
Important Notes
CLI / SDK Versions
Using the new algorithms requires updated CLI/SDK. Support confirmed with CLI 2.34.35.
aws --version
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip && sudo ./aws/install --update
Existing algorithms (CRC32, CRC32C, SHA-1, SHA-256) have been supported previously. CRC64NVME was added in early 2025, so earlier versions cannot use it.
MD5
While supported at the S3 API level, automatic calculation using SDK/CLI's --checksum-algorithm MD5 is not supported. Pre-calculated values must be specified using the x-amz-checksum-md5 header. MD5 for multipart uploads can be calculated afterward using S3 Batch Operations' Compute checksum.
Directory Buckets
Directory buckets (S3 Express One Zone) do not support specifying checksum algorithms.
Conclusion
With this announcement, S3's checksum algorithm options have expanded from 5 to 10. The addition of XXHash algorithms simplifies integration with existing data pipelines, while SHA-512 and MD5 provide more options for compliance and legacy compatibility.
For most workloads, the default CRC64NVME provides sufficient speed and integrity. There's no need to deliberately change algorithms.
However, our benchmarks confirmed clear differences in checksum calculation speeds depending on CPU architecture and algorithm combinations. The newly added algorithms are worth trying for workloads that require integration with existing data pipelines or faster computation processing.