M7gインスタンスをベンチマークしてみた
こんにちは。CX事業本部Delivery部のakkyです。
2月13日にGraviton3を搭載する新たなインスタンスであるM7gとR7gがGAされました。
https://aws.amazon.com/jp/about-aws/whats-new/2023/02/amazon-ec2-m7g-r7g-instances/
すでにDevelopersIOで記事も書かれています。
M7g (汎用), R7g(メモリ最適化) AWS Graviton 3を搭載する新しいEC2インスタンスがリリースされました
Graviton2と比較して、コンピュート性能 最大25%、浮動小数点計算2倍、暗号計算性能2倍とされていますが、実際の性能はどうなのでしょうか。Graviton3を搭載するM7g.largeとGraviton2を搭載するM6g.largeインスタンスと比較してみました。
共に2vCPU、8GB RAMです。オレゴンリージョン(us-west-2)で検証しました。
/proc/cpuinfo
M7g.large
M6gと比較して、拡張機能がたくさんついていますね。Graviton3は命令セットがArmv8.4-A、Graviton2はArmv8.2-Aなので、その違いが表れています。
BogoMIPS値は何かおかしい気がします。
$ cat /proc/cpuinfo processor : 0 BogoMIPS : 2100.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd40 CPU revision : 1 processor : 1 BogoMIPS : 2100.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd40 CPU revision : 1
M6g.large
$ cat /proc/cpuinfo processor : 0 BogoMIPS : 243.75 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x3 CPU part : 0xd0c CPU revision : 1 processor : 1 BogoMIPS : 243.75 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x3 CPU part : 0xd0c CPU revision : 1
Unix Bench
定番のベンチマークです。CPUとOSの性能を見ます。
M7g.large
======================================================================== BYTE UNIX Benchmarks (Version 5.1.3) System: ip-172-31-5-119: GNU/Linux OS: GNU/Linux -- 5.15.0-1028-aws -- #32-Ubuntu SMP Mon Jan 9 12:29:05 UTC 2023 Machine: aarch64 (aarch64) Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8") CPU 0: (2100.0 bogomips) CPU 1: (2100.0 bogomips) 01:56:19 up 8 min, 1 user, load average: 0.09, 0.06, 0.01; runlevel 2023-02-15 ------------------------------------------------------------------------ Benchmark Run: Wed Feb 15 2023 01:56:19 - 02:24:14 2 CPUs in system; running 1 parallel copy of tests Dhrystone 2 using register variables 49993610.6 lps (10.0 s, 7 samples) Double-Precision Whetstone 7679.0 MWIPS (9.9 s, 7 samples) Execl Throughput 2123.4 lps (30.0 s, 2 samples) File Copy 1024 bufsize 2000 maxblocks 900104.6 KBps (30.0 s, 2 samples) File Copy 256 bufsize 500 maxblocks 247783.5 KBps (30.0 s, 2 samples) File Copy 4096 bufsize 8000 maxblocks 2661191.4 KBps (30.0 s, 2 samples) Pipe Throughput 1391358.9 lps (10.0 s, 7 samples) Pipe-based Context Switching 148827.8 lps (10.0 s, 7 samples) Process Creation 5187.3 lps (30.0 s, 2 samples) Shell Scripts (1 concurrent) 8162.8 lpm (60.0 s, 2 samples) Shell Scripts (8 concurrent) 1635.7 lpm (60.0 s, 2 samples) System Call Overhead 874665.4 lps (10.0 s, 7 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 49993610.6 4283.9 Double-Precision Whetstone 55.0 7679.0 1396.2 Execl Throughput 43.0 2123.4 493.8 File Copy 1024 bufsize 2000 maxblocks 3960.0 900104.6 2273.0 File Copy 256 bufsize 500 maxblocks 1655.0 247783.5 1497.2 File Copy 4096 bufsize 8000 maxblocks 5800.0 2661191.4 4588.3 Pipe Throughput 12440.0 1391358.9 1118.5 Pipe-based Context Switching 4000.0 148827.8 372.1 Process Creation 126.0 5187.3 411.7 Shell Scripts (1 concurrent) 42.4 8162.8 1925.2 Shell Scripts (8 concurrent) 6.0 1635.7 2726.2 System Call Overhead 15000.0 874665.4 583.1 ======== System Benchmarks Index Score 1304.0 ------------------------------------------------------------------------ Benchmark Run: Wed Feb 15 2023 02:24:14 - 02:52:10 2 CPUs in system; running 2 parallel copies of tests Dhrystone 2 using register variables 100238134.4 lps (10.0 s, 7 samples) Double-Precision Whetstone 15359.6 MWIPS (9.9 s, 7 samples) Execl Throughput 3936.8 lps (30.0 s, 2 samples) File Copy 1024 bufsize 2000 maxblocks 1544096.3 KBps (30.0 s, 2 samples) File Copy 256 bufsize 500 maxblocks 436658.6 KBps (30.0 s, 2 samples) File Copy 4096 bufsize 8000 maxblocks 4763765.4 KBps (30.0 s, 2 samples) Pipe Throughput 2781042.7 lps (10.0 s, 7 samples) Pipe-based Context Switching 295290.1 lps (10.0 s, 7 samples) Process Creation 9161.9 lps (30.0 s, 2 samples) Shell Scripts (1 concurrent) 12216.4 lpm (60.0 s, 2 samples) Shell Scripts (8 concurrent) 1660.8 lpm (60.0 s, 2 samples) System Call Overhead 1747584.1 lps (10.0 s, 7 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 100238134.4 8589.4 Double-Precision Whetstone 55.0 15359.6 2792.7 Execl Throughput 43.0 3936.8 915.5 File Copy 1024 bufsize 2000 maxblocks 3960.0 1544096.3 3899.2 File Copy 256 bufsize 500 maxblocks 1655.0 436658.6 2638.4 File Copy 4096 bufsize 8000 maxblocks 5800.0 4763765.4 8213.4 Pipe Throughput 12440.0 2781042.7 2235.6 Pipe-based Context Switching 4000.0 295290.1 738.2 Process Creation 126.0 9161.9 727.1 Shell Scripts (1 concurrent) 42.4 12216.4 2881.2 Shell Scripts (8 concurrent) 6.0 1660.8 2768.0 System Call Overhead 15000.0 1747584.1 1165.1 ======== System Benchmarks Index Score 2289.0
M6g.large
m6g.large
======================================================================== BYTE UNIX Benchmarks (Version 5.1.3) System: ip-172-31-5-119: GNU/Linux OS: GNU/Linux -- 5.15.0-1028-aws -- #32-Ubuntu SMP Mon Jan 9 12:29:05 UTC 2023 Machine: aarch64 (aarch64) Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8") CPU 0: (243.8 bogomips) CPU 1: (243.8 bogomips) 04:46:43 up 1:40, 1 user, load average: 0.41, 1.35, 1.81; runlevel 2023-02-15 ------------------------------------------------------------------------ Benchmark Run: Wed Feb 15 2023 04:46:43 - 05:14:40 2 CPUs in system; running 1 parallel copy of tests Dhrystone 2 using register variables 39738207.9 lps (10.0 s, 7 samples) Double-Precision Whetstone 7198.9 MWIPS (9.9 s, 7 samples) Execl Throughput 2057.5 lps (30.0 s, 2 samples) File Copy 1024 bufsize 2000 maxblocks 870263.3 KBps (30.0 s, 2 samples) File Copy 256 bufsize 500 maxblocks 248097.6 KBps (30.0 s, 2 samples) File Copy 4096 bufsize 8000 maxblocks 2308360.1 KBps (30.0 s, 2 samples) Pipe Throughput 1335004.2 lps (10.0 s, 7 samples) Pipe-based Context Switching 88057.3 lps (10.0 s, 7 samples) Process Creation 4962.1 lps (30.0 s, 2 samples) Shell Scripts (1 concurrent) 7268.3 lpm (60.0 s, 2 samples) Shell Scripts (8 concurrent) 1478.5 lpm (60.0 s, 2 samples) System Call Overhead 925608.8 lps (10.0 s, 7 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 39738207.9 3405.2 Double-Precision Whetstone 55.0 7198.9 1308.9 Execl Throughput 43.0 2057.5 478.5 File Copy 1024 bufsize 2000 maxblocks 3960.0 870263.3 2197.6 File Copy 256 bufsize 500 maxblocks 1655.0 248097.6 1499.1 File Copy 4096 bufsize 8000 maxblocks 5800.0 2308360.1 3979.9 Pipe Throughput 12440.0 1335004.2 1073.2 Pipe-based Context Switching 4000.0 88057.3 220.1 Process Creation 126.0 4962.1 393.8 Shell Scripts (1 concurrent) 42.4 7268.3 1714.2 Shell Scripts (8 concurrent) 6.0 1478.5 2464.2 System Call Overhead 15000.0 925608.8 617.1 ======== System Benchmarks Index Score 1172.9 ------------------------------------------------------------------------ Benchmark Run: Wed Feb 15 2023 05:14:40 - 05:42:37 2 CPUs in system; running 2 parallel copies of tests Dhrystone 2 using register variables 79409640.2 lps (10.0 s, 7 samples) Double-Precision Whetstone 14393.4 MWIPS (9.9 s, 7 samples) Execl Throughput 3770.3 lps (30.0 s, 2 samples) File Copy 1024 bufsize 2000 maxblocks 1475162.1 KBps (30.0 s, 2 samples) File Copy 256 bufsize 500 maxblocks 432270.4 KBps (30.0 s, 2 samples) File Copy 4096 bufsize 8000 maxblocks 3315535.3 KBps (30.0 s, 2 samples) Pipe Throughput 2667690.5 lps (10.0 s, 7 samples) Pipe-based Context Switching 323040.4 lps (10.0 s, 7 samples) Process Creation 9770.6 lps (30.0 s, 2 samples) Shell Scripts (1 concurrent) 11168.5 lpm (60.0 s, 2 samples) Shell Scripts (8 concurrent) 1472.6 lpm (60.0 s, 2 samples) System Call Overhead 1852372.5 lps (10.0 s, 7 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 79409640.2 6804.6 Double-Precision Whetstone 55.0 14393.4 2617.0 Execl Throughput 43.0 3770.3 876.8 File Copy 1024 bufsize 2000 maxblocks 3960.0 1475162.1 3725.2 File Copy 256 bufsize 500 maxblocks 1655.0 432270.4 2611.9 File Copy 4096 bufsize 8000 maxblocks 5800.0 3315535.3 5716.4 Pipe Throughput 12440.0 2667690.5 2144.4 Pipe-based Context Switching 4000.0 323040.4 807.6 Process Creation 126.0 9770.6 775.4 Shell Scripts (1 concurrent) 42.4 11168.5 2634.1 Shell Scripts (8 concurrent) 6.0 1472.6 2454.4 System Call Overhead 15000.0 1852372.5 1234.9 ======== System Benchmarks Index Score 2141.7
比較
シングルスレッド
Dhrystone 2が125%、Pipe-based Context Switchingが169%、その他は5-15%程度高速でした。総合では11%高速でした。
マルチスレッド
シングルスレッドと同様の傾向になりました。(File Copy 4096は理由不明です)
OpenSSL speed
暗号性能も向上しているとのことなので、OpenSSLでテストしてみました。
SHA512とAES-128-GCMで比較しました。
M7g.large
$ openssl speed -evp sha512 Doing sha512 for 3s on 16 size blocks: 10558323 sha512's in 3.00s Doing sha512 for 3s on 64 size blocks: 10609983 sha512's in 3.00s Doing sha512 for 3s on 256 size blocks: 5556640 sha512's in 3.00s Doing sha512 for 3s on 1024 size blocks: 2304995 sha512's in 3.00s Doing sha512 for 3s on 8192 size blocks: 356516 sha512's in 3.00s Doing sha512 for 3s on 16384 size blocks: 181427 sha512's in 3.00s version: 3.0.2 built on: Mon Feb 6 17:57:17 2023 UTC options: bn(64,64) compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-oZetzz/openssl-3.0.2=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2 CPUINFO: OPENSSL_armcap=0xff The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha512 56311.06k 226346.30k 474166.61k 786771.63k 973526.36k 990833.32k
$ openssl speed -evp sha512 Doing sha512 for 3s on 16 size blocks: 10558323 sha512's in 3.00s Doing sha512 for 3s on 64 size blocks: 10609983 sha512's in 3.00s Doing sha512 for 3s on 256 size blocks: 5556640 sha512's in 3.00s Doing sha512 for 3s on 1024 size blocks: 2304995 sha512's in 3.00s Doing sha512 for 3s on 8192 size blocks: 356516 sha512's in 3.00s Doing sha512 for 3s on 16384 size blocks: 181427 sha512's in 3.00s version: 3.0.2 built on: Mon Feb 6 17:57:17 2023 UTC options: bn(64,64) compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-oZetzz/openssl-3.0.2=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2 CPUINFO: OPENSSL_armcap=0xff The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha512 56311.06k 226346.30k 474166.61k 786771.63k 973526.36k 990833.32k ubuntu@ip-172-31-5-119:~$ openssl speed -evp aes-128-gcm Doing AES-128-GCM for 3s on 16 size blocks: 119146247 AES-128-GCM's in 2.99s Doing AES-128-GCM for 3s on 64 size blocks: 82901661 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 256 size blocks: 33596767 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 1024 size blocks: 12051388 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 8192 size blocks: 1654743 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 16384 size blocks: 833177 AES-128-GCM's in 3.00s version: 3.0.2 built on: Mon Feb 6 17:57:17 2023 UTC options: bn(64,64) compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-oZetzz/openssl-3.0.2=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2 CPUINFO: OPENSSL_armcap=0xff The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-128-GCM 637571.89k 1768568.77k 2866924.12k 4113540.44k 4518551.55k 4550257.32k
M6g.large
$ openssl speed -evp sha512 Doing sha512 for 3s on 16 size blocks: 5653728 sha512's in 3.00s Doing sha512 for 3s on 64 size blocks: 5649812 sha512's in 3.00s Doing sha512 for 3s on 256 size blocks: 2621488 sha512's in 3.00s Doing sha512 for 3s on 1024 size blocks: 1001815 sha512's in 3.00s Doing sha512 for 3s on 8192 size blocks: 148089 sha512's in 3.00s Doing sha512 for 3s on 16384 size blocks: 75075 sha512's in 3.00s version: 3.0.2 built on: Mon Feb 6 17:57:17 2023 UTC options: bn(64,64) compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-oZetzz/openssl-3.0.2=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2 CPUINFO: OPENSSL_armcap=0xbf The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha512 30153.22k 120529.32k 223700.31k 341952.85k 404381.70k 410009.60k
$ openssl speed -evp aes-128-gcm Doing AES-128-GCM for 3s on 16 size blocks: 64254877 AES-128-GCM's in 2.99s Doing AES-128-GCM for 3s on 64 size blocks: 46072641 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 256 size blocks: 18846950 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 1024 size blocks: 6425855 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 8192 size blocks: 901257 AES-128-GCM's in 3.00s Doing AES-128-GCM for 3s on 16384 size blocks: 454285 AES-128-GCM's in 3.00s version: 3.0.2 built on: Mon Feb 6 17:57:17 2023 UTC options: bn(64,64) compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-oZetzz/openssl-3.0.2=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2 CPUINFO: OPENSSL_armcap=0xbf The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-128-GCM 343838.81k 982883.01k 1608273.07k 2193358.51k 2461032.45k 2481001.81k
比較
SHA512は1.8~2.4倍の高速化、AESは1.8倍の高速化を確認できました。ほぼ実際に宣伝されている性能向上が確認できました。
まとめ
Graviton3の性能向上を検証しました。 インスタンスの料金はおおよそ6%高いですが、それに見合った性能があることが確認できました。 東京、大阪リージョンにも導入されることを期待しています。