OS Bypass in Elastic Fabric Adapter for HPC and Machine Learning

2021.11.24

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

What is EFA?

EFA is just ENA with added capabilities. It can handle normal IP traffic and it comes with what is known as OS Bypass. OS Bypass allows applications to skip operating system and directly use network interface. Having done so, the efficiency is improved by reducing the overhead of normal IP traffic.

The candidates for EFA are HPC (High Performance Computing) and ML (Machine Learning) where hundreds or thousands of applications are running using shared network and efficiency is the key.

Architecture

Normally when using a ENA that provides TCP/IP traffic the architecture looks like on the diagram below. HPC applications use MPI (Message Passing Interface), then system's TCP/IP stack and finally ENA device driver to communicate. The TCP/IP stack is what is skipped when using EFA, effectively reducing overhead.

With EFA, HPC applications use MPI or NCCL interface with Libfabric API. Libfabric bypasses OS Kernel and allows to directly communicate with Elastic Fabric Adapter.

Benefits and Limitations

What all that OS Bypass gives us? In short the benefits of EFA are as follows:

  • High data throughput
  • 100 Gbps network bandwidth
  • Rapid packet loss recovery
  • Lower and more consistent recover

High speed of 100Gbps and lower latency is available with constraints. EFA's special capabilities work only in the same subnet. Cross-subnet communication is still possible, but the downside is that there is no performance benefit, meaning it will be as regular TCP/IP traffic. Using EFA requires also a Security Group that ALLOWS ALL inbound and outbound traffic as well as self-referential rule.

Conclusion

HPC and ML applications require high speeds and low latencies. EFA is a devices that helps reduce TCP/IP stack overhead to achieve 100Gbps bandwidth and lower more consistent latency than ENA by utilizing what's called OS Bypass. The biggest limitation is that EFA's capabilities work only in the same subnet, the cross-subnet traffic is the same as using ENA.