HDR 200Gb/s InfiniBand is Leading the Race to Exascale

HDR 200Gb/s InfiniBand is Leading the Race to Exascale

January 28, 2019

HPCwire | Sponsored Content by Mellanox Technologies

Technical computing systems for both High Performance Computing (HPC) and Artificial Intelligence (AI) applications leverage InfiniBand for multiple reasons. Beyond the fast data throughput, ultra-low latency, ease of deployment and a cost performance advantages of the latest HDR 200Gb/s InfiniBand technology, there are various capabilities that make it the best interconnect solution for compute intensive HPC and AI workloads.

Over generations of InfiniBand technology, proven performance and application scalability in the world’s largest deployments is undisputed. Additionally, InfiniBand is a standards-based interconnect, ensuring backwards compatibility, and also future compatibility across generations. Additionally, the recent developments of In-Network Computing capabilities has set InfiniBand in the forefront of pre-Exascale and Exascale systems.

Furthermore, the advanced CPU-offload acceleration engines, GPUDirect™ RDMA and support for any number of topologies including enhanced Dragonfly which debuted in 2017 at the University of Toronto, are all driving factors why the leading research centers and industry deployments keep choosing InfiniBand.

Today’s Research and Industry Moving to HDR 

Some of the most recent announcements from some of the world’s leading HPC research institutions, building their next generation HPC-AI systems with HDR 200Gb/s InfiniBand include the University of Michigan, Lawrence Livermore National Laboratory (LLNL), Texas Advanced Computing Center (TACC), University of Stuttgart (HLRS). CSC, the Finnish IT Center for Science, and the Finnish Meteorological Institute, also selected 200 Gigabit HDR InfiniBand to accelerate their multi-phase supercomputer program, among many more.

In-Network Computing with Mellanox SHARP

SHARP - Scalable Hierarchical Aggregation and Reduction Protocol

Among the advanced capabilities included in the latest generation of HDR 200Gb/s InfiniBand, beyond the highest bandwidth and lowest latency available, is Mellanox Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ technology which enables the execution of algorithms on the data as it is being transferred within the network. It also reduces the amount of data needed to traverse the network providing the highest application performance and scalability.

Mellanox SHARP is one of the most pervasive features showcasing the latest in-network computing capabilities of the Mellanox network. SHARP performance advantages as measured on the Oak Ridge National Laboratory’s Summit system demonstrated that In-Network Computing technology is essential to overcome performance bottlenecks and to improve scalability.

MPI Barrier Latency and MPI_Allreduce Latency

In addition to MPI-based (message passing interface) applications, Mellanox SHARP greatly enhances AI performance, from small scale to large scale deployments. Initial testing based on SHARP showcases nearly a 20 percent performance boost with only a small set of GPUs. With more GPUs being used, the performance advantage of InfiniBand with SHARP will continue to increase as well.

ResNet50 Performance

A Self-Healing Interconnect

SHIELD - Self-Healing Interconnect Technology

Another capability, recently brought into the Mellanox InfiniBand feature-set is Mellanox SHIELD™ technology, which enables self-healing interconnect capabilities to deliver the highest network robustness and reliability. SHIELD is an innovative interconnect technology that improves data center fault recovery by 5000 times by enabling interconnect autonomous self-healing capabilities. SHIELD technology is enabled within Mellanox’s 100G EDR and 200G HDR InfiniBand solutions, providing the ability for interconnect components to exchange real-time information and to make immediate smart decisions that overcome issues and optimize data flows.

Introducing HDR100 for Ultimate Scalability

Mellanox Quantum

Mellanox Quantum also offers the HDR100 option, which enables ultimate scalability for data centers. For 100 gigabit connectivity, the Quantum switch offers a switch radix of 80 ports, to create the densest and most efficient 100 gigabit switch available in the market for high performance applications.

The higher HDR InfiniBand switch port count greatly reduces total cost of ownership. HDR and HDR100 technology advantages, together with the high data throughput and the extremely low latency, make InfiniBand the preferred interconnect choice for both pre-Exascale and Exascale compute and storage platforms.