Performance Measurement of Amazon S3's New Checksum Algorithm on the Latest Generation of Intel/AMD/Graviton4 EC2

Performance Measurement of Amazon S3's New Checksum Algorithm on the Latest Generation of Intel/AMD/Graviton4 EC2

S3's checksum algorithms expanded to 10 types. I organized the characteristics of newly added XXHash3/XXHash64/XXHash128/SHA-512/MD5, and measured the checksum calculation speed and S3 transfer speed on c8i (Intel)/c8a (AMD)/c8g (Graviton4) 4xlarge instances.
2026.04.24

This page has been translated by machine translation. View original

On April 23, 2026, Amazon S3 added support for 5 new checksum algorithms, bringing the total to 10.

https://aws.amazon.com/jp/about-aws/whats-new/2026/04/s3-five-additional-checksum-algorithms/

This article examines how algorithm selection and instance type affect performance through benchmarks on the latest generation of three EC2 architectures (Intel Emerald Rapids / AMD EPYC / AWS Graviton4).

What has changed

Category Previously Available Newly Added
CRC family CRC32, CRC32C, CRC64NVME
XXHash family XXHash3, XXHash64, XXHash128
SHA family SHA-1, SHA-256 SHA-512
Others MD5

Available in 37 regions. The default remains CRC64NVME.

Algorithm Characteristics

Non-cryptographic Hashes (Fast, for integrity checks)

Algorithm Bit Length Features
CRC32 / CRC32C 32bit Lightweight. Hardware acceleration supported
CRC64NVME 64bit S3 default. Highly optimized in AWS CRT
XXHash64 64bit Fast general-purpose hash. Widely used in data pipelines
XXHash3 64bit Latest version of XXHash. SIMD optimized
XXHash128 128bit 128bit version of XXHash3. Even lower collision probability

Cryptographic Hashes (For security and compliance)

Algorithm Bit Length Features
MD5 128bit Collision attacks proven. For legacy compatibility
SHA-1 160bit Collisions proven. Not recommended
SHA-256 256bit Currently secure. Widely used
SHA-512 512bit Longer hash value than SHA-256. Can be more efficient on 64bit CPUs

Non-cryptographic hashes are sufficient for detecting data corruption during transfers. For protecting against data tampering on S3, IAM policies and S3 Object Lock are more appropriate than checksums.

Benchmarks

Test Environment

Testing was conducted on three instance types in the Tokyo region (ap-northeast-1).

Item c8i.4xlarge c8a.4xlarge c8g.4xlarge
CPU Intel Xeon 6975P-C (Emerald Rapids) AMD EPYC 9R45 AWS Graviton4
Architecture x86_64 x86_64 aarch64
vCPU / Physical cores 16 / 8 (with SMT) 16 / 16 (no SMT) 16 / 16 (no SMT)
Memory 32GB DDR5 32GB DDR5 32GB DDR5
Network bandwidth Up to 12.5 Gbps Up to 15 Gbps Up to 12.5 Gbps
On-demand price $0.4718/h $0.5426/h $0.4003/h

Test data (5GB) was placed on tmpfs (RAM disk) to eliminate storage I/O bottlenecks. Checksum calculation speeds were measured using Python's xxhash (C extension) and hashlib on in-memory data with single-thread processing. S3 transfer speeds were measured using aws s3 cp with AWS CLI 2.34.35. All results are averages of three runs.

Checksum Calculation Speed (5GB, Single-thread, Average of 3 runs)

Algorithm c8i (Intel) c8a (AMD) c8g (Graviton4)
XXHash3_64 14.8 GB/s 36.7 GB/s 23.4 GB/s
XXHash3_128 14.8 GB/s 36.8 GB/s 23.3 GB/s
XXHash64 7.0 GB/s 22.0 GB/s 18.3 GB/s
SHA-256 1.7 GB/s 2.1 GB/s 1.7 GB/s
CRC32 1.3 GB/s 1.7 GB/s 1.0 GB/s
SHA-512 0.8 GB/s 1.2 GB/s 1.0 GB/s
MD5 0.8 GB/s 0.9 GB/s 0.6 GB/s

Variation between runs was extremely small (coefficient of variation under 1%), indicating highly reproducible results.

AMD was fastest across all algorithms. XXHash3 reached 36.7 GB/s, 2.5 times faster than Intel and 1.6 times faster than Graviton4. This is likely due to the AVX-512 optimizations in the xxhash library. Graviton4 also significantly outperformed Intel, achieving 23.4 GB/s with XXHash3 and 18.3 GB/s with XXHash64, demonstrating effective ARM NEON optimizations.

S3 Transfer Speed (5GB, tmpfs → S3, Average of 3 runs)

Upload throughput using aws s3 cp. The 5GB file is processed as a multipart upload with parts transferred in parallel.

Algorithm c8i (Intel) c8a (AMD) c8g (Graviton4)
CRC32C 507 MB/s 494 MB/s 425 MB/s
CRC32 470 MB/s 405 MB/s 413 MB/s
XXHASH3 461 MB/s 411 MB/s 428 MB/s
SHA-256 458 MB/s 382 MB/s 450 MB/s
CRC64NVME 455 MB/s 421 MB/s 421 MB/s
XXHASH128 432 MB/s 480 MB/s 443 MB/s
XXHASH64 423 MB/s 434 MB/s 480 MB/s
SHA-1 403 MB/s 469 MB/s 491 MB/s
SHA-512 373 MB/s 355 MB/s 395 MB/s

While checksum calculation speeds showed up to a 47x difference between algorithms (XXHash3 vs MD5), S3 transfer speeds across all three environments fell within a much narrower range of 350-510 MB/s, significantly reducing the differences between algorithms.

Analysis

Algorithm choice has limited impact on S3 transfers

Using 4xlarge instances with sufficient network bandwidth and tmpfs to eliminate storage I/O bottlenecks, all three environments achieved throughputs around 400-500 MB/s regardless of algorithm. Even with tens of times difference in checksum calculation speed, the impact on overall S3 transfer throughput was minimal. This is because network I/O and SDK overhead dominate, with checksum calculation representing only a small portion of the overall process.

Small differences between the three environments

There was no clear advantage among the three architectures in S3 transfer speeds. However, there were differences in run-to-run variation (standard deviation), with c8i showing the largest (average SD 79.5 MB/s) and c8a showing the smallest (average SD 41.4 MB/s).

In preliminary tests using .large (2 vCPU) instances, c8i.large (1 physical core with SMT) showed throughput drops to as low as 141 MB/s with some algorithms. This was not observed with c8a.large/c8g.large (2 physical cores) and was resolved when switching to 4xlarge for c8i, suggesting that limited physical cores can bottleneck parallel processing. For .large class instances, EBS throughput and network bandwidth constraints also contribute, making it difficult to identify a single cause. For environments heavily using multipart uploads, choosing instance sizes with sufficient physical cores is important.

Architecture differences are pronounced in checksum calculation speed

For pure checksum calculation speed, AMD (c8a) was significantly faster. This would be beneficial for workloads with high checksum calculation requirements or environments with very high network bandwidth (25 Gbps or higher) for multipart uploads.

Architecture Selection Guidelines

Priority Recommendation
Checksum calculation speed c8a (AMD) is fastest
S3 transfer speed All three are comparable (minor differences)
Cost performance c8g (Graviton4) is most economical

c8g is 15% cheaper than c8i and 26% cheaper than c8a. Given the minimal differences in S3 transfer speeds, c8g (Graviton4) is suitable for cost-conscious workloads, while c8a (AMD) is better for performance-focused needs.

Use Case Recommendation Reason
General purpose CRC64NVME (default) No change needed. Adequate speed and accuracy
Integration with existing pipelines XXHash64 / XXHash3 Compatibility with Spark, ClickHouse, etc.
Large-scale data lakes XXHash128 128bit with extremely low collision risk
Compliance requirements SHA-256 / SHA-512 FIPS compliance, financial/medical/government sectors
Legacy system compatibility MD5 For pre-computed value specification in headers only

Important Notes

CLI / SDK Versions

Using the new algorithms requires updated CLI/SDK. Support confirmed with CLI 2.34.35.

aws --version
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip && sudo ./aws/install --update

Existing algorithms (CRC32, CRC32C, SHA-1, SHA-256) have been supported previously. CRC64NVME was added in early 2025, so earlier versions cannot use it.

MD5

While supported at the S3 API level, automatic calculation using SDK/CLI's --checksum-algorithm MD5 is not supported. Pre-calculated values must be specified using the x-amz-checksum-md5 header. MD5 for multipart uploads can be calculated afterward using S3 Batch Operations' Compute checksum.

Directory Buckets

Directory buckets (S3 Express One Zone) do not support specifying checksum algorithms.

Conclusion

With this announcement, S3's checksum algorithm options have expanded from 5 to 10. The addition of XXHash algorithms simplifies integration with existing data pipelines, while SHA-512 and MD5 provide more options for compliance and legacy compatibility.

For most workloads, the default CRC64NVME provides sufficient speed and integrity. There's no need to deliberately change algorithms.

However, our benchmarks confirmed clear differences in checksum calculation speeds depending on CPU architecture and algorithm combinations. The newly added algorithms are worth trying for workloads that require integration with existing data pipelines or faster computation processing.

Share this article