Purpose-Built for the Convergence of Simulation,   Data Analytics, and AI

Massive datasets, exploding model sizes, and complex simulations require multiple GPUs with extremely fast interconnections and a fully accelerated software stack. The NVIDIA HGX AI supercomputing platform brings together the full power of NVIDIA GPUs, NVIDIA® NVLink®, NVIDIA InfiniBand networking, and a fully optimized NVIDIA AI and HPC software stack from the NVIDIA NGC catalog to provide the highest application performance. With its end-to-end performance and flexibility, NVIDIA HGX enables researchers and scientists to combine simulation, data analytics, and AI to drive scientific progress.

Unmatched End-to-End Accelerated Computing Platform

NVIDIA HGX combines NVIDIA A100 Tensor Core GPUs with high-speed interconnects to form the world’s most powerful servers. With 16 A100 GPUs, HGX has up to 1.3 terabytes (TB) of GPU memory and over 2 terabytes per second (TB/s) of memory bandwidth for unprecedented acceleration.

Compared to previous generations, HGX provides up to a 20X AI speedup out of the box with Tensor Float 32 (TF32) and a 2.5X HPC speedup with FP64. NVIDIA HGX delivers a staggering 10 petaFLOPS, forming the world’s most powerful accelerated scale-up server platform for AI and HPC.

Deep Learning Performance

Up to 3X Higher AI Training on Largest Model

Deep learning models are exploding in size and complexity, requiring a system with large amounts of memory, massive computing power, and fast interconnects for scalability. With NVIDIA NVSwitch providing high-speed, all-to-all GPU communications, HGX can handle the most advanced AI models. With A100 80GB GPUs, GPU memory is doubled, delivering up to 1.3TB of memory in a single HGX. Emerging workloads on the very largest models like deep learning recommendation models (DLRM), which have massive data tables, are accelerated up to 3X over HGX powered by A100 40GB GPUs.

Machine Learning Performance

2X Faster than A100 40GB on Big Data Analytics

Machine learning models require loading, transforming, and processing extremely large datasets to glean critical insights. With up to 1.3TB of unified memory and all-to-all GPU communications with NVSwitch, HGX powered by A100 80GB GPUs has the capability to load and perform calculations on enormous datasets to derive actionable insights quickly.

On a big data analytics benchmark, A100 80GB delivered insights with 2X higher throughput over A100 40GB, making it ideally suited for emerging workloads with exploding dataset sizes.

HPC Performance

HPC applications need to perform an enormous amount of calculations per second. Increasing the compute density of each server node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors for computation, making GPUs connected by NVIDIA NVLink ideal. HPC applications can also leverage TF32 in A100 to achieve up to 11X higher throughput in four years for single-precision, dense matrix-multiply operations.

An HGX powered by A100 80GB GPUs delivers a 2X throughput increase over A100 40GB GPUs on Quantum Espresso, a materials simulation, boosting time to insight.

NVIDIA HGX Specifications

NVIDIA HGX is available in single baseboards with four or eight H100 GPUs and 80GB of GPU memory, or A100 GPUs, each with 40GB or 80GB of GPU memory. The 4-GPU configuration is fully interconnected with NVIDIA NVLink, and the 8-GPU configuration is interconnected with NVIDIA NVSwitch. Two HGX A100 8-GPU baseboards can be combined using an NVSwitch interconnect to create a powerful 16-GPU single node.

HGX is also available in a PCIe form factor for a modular, easy-to-deploy option, bringing the highest computing performance to mainstream servers.

This powerful combination of hardware and software lays the foundation for the ultimate AI supercomputing platform.

HGX H100

Accelerating HGX with NVIDIA Networking

With HGX, it’s also possible to include NVIDIA networking to accelerate and offload data transfers and ensure the full utilization of computing resources. Smart adapters and switches reduce latency, increase efficiency, enhance security, and simplify data center automation to accelerate end-to-end application performance.

The data center is the new unit of computing, and HPC networking plays an integral role in scaling application performance across the entire data center. NVIDIA InfiniBand is paving the way with software-defined networking, In-Network Computing acceleration, remote direct-memory access (RDMA), and the fastest speeds and feeds.