Components & Peripherals News
Nvidia’s H200 GPU To One-Up H100 With 141GB Of HBM3e As Memory Race Heats Up
The H200 features 141GB of HBM3e and a 4.8 TB/s memory bandwidth, a substantial step up from Nvidia’s flagship H100 data center GPU. ‘The integration of faster and more extensive memory will dramatically improve and accelerate HPC and AI applications,’ an Nvidia representative says.
Nvidia plans to release a successor to its powerful and popular H100 data center GPU next year with a significant upgrade to memory capacity and bandwidth, giving it the ability to handle massive generative AI and high-performance computing applications faster and more effectively.
At the Supercomputing 2023 conference, the AI computing giant announced on Monday that the H200 GPU will feature 141GB of HBM3e high-bandwidth memory and a 4.8 TB/s memory bandwidth. This is a substantial step up from the H100’s 80GB of HBM3 and 3.5 TB/s in memory capabilities.
The two chips are otherwise identical.
“The integration of faster and more extensive memory will dramatically improve and accelerate HPC and AI applications,” said Dion Harris, director of product go-to-market for accelerated data center solutions at Nvidia, in a briefing with journalists.
The Santa Clara, Calif.-based company also revealed that the H200 is part of its previously announced second-generation Grace Hopper Superchip. The GH200 combines the H200 and 141GB of HBM3e with its 72-core, Arm-based Grace CPU and 480GB of LPDDR5X memory on an integrated module.
H200 Arrives In Servers, Clouds In Q2 2024
Nvidia said the H200 will become available in systems and cloud instances starting in the second quarter of next year through HGX H200 server boards in four-GPU and eight-GPU configurations.
Those supporting the chip include server vendors such as Dell Technologies, Hewlett Packard Enterprise, Lenovo and Supermicro as well as cloud service providers Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud plus small but rising cloud players CoreWeave, Lambda and Vultr.
Existing HGX H100-based systems are software- and hardware-compatible with the H200, which will allow server and cloud vendors to easily update those systems with the new GPU, Harris said.
Memory Race Heats Up Among Rivals
Nvidia revealed the H200 details as rival AMD plans to push the memory envelope with its upcoming AI chip. Called the Instinct MI300X, the GPU will come with 192GB of HBM3 and a 5.2 TB/s memory bandwidth, which would put it well above the H200 in capacity and bandwidth.
Intel, too, plans to ramp up the HBM capacity of its Gaudi AI chip, saying in a recent presentation that the third generation, due out next year, will increase to 144GB from the predecessor’s 96GB HBM2e capacity. The company’s other AI chip family, the Intel Max Series GPU, currently goes up to 128GB of HBM2, but it plans to increase the chip’s capacity in future generations.
“When you look at what’s happening in the market, model sizes are rapidly expanding, demanding increased computational power, as well as faster and stronger memory subsystems,” Harris said.
How H200 Compares To H100 And A100
Nvidia is promoting the H200 as a big upgrade over both the H100, which debuted in 2022, and its predecessor, the A100, which debuted in 2020, when it comes to two popular large language models (LLMs) and a handful of high-performance computing workloads.
For running inference on the GPT-3 175B LLM, the H200 is 60 percent faster than the H100. For inference on the Llama2 70B LLM, the GPU is even faster, getting a 90 percent boost.
For HPC, Nvidia decided to compare the H200 to the A100, saying that the new GPU is two times faster on average across the CP2K, GROMACS, ICON, MILC, Chroma and Quantum Espresso benchmarks.
When eight H200s are combined in an HGX H200-based system, the system can provide “32 petaflops of FP8 deep learning compute and 1.1TB of aggregate bandwidth memory for the highest performance in generative AI and HPC applications,” according to Nvidia.
Nvidia is promising to unleash further performance improvements in the future through software updates, like it did with the recently released TensorRT-LLM software library.
“Our relentless pursuit of energy efficient performance improvements through hardware and software remain a key focus as we move forward,” Harris said.
The H200 is based on the same Hopper architecture that powers the H100. This means, outside the new memory capabilities, the H200 sports the same features, such as the Transformer Engine, which speeds up LLMs and other deep learning models based on the transformer architecture.
GH200: Cloud Access Today, In Servers Soon
While the H200 won’t arrive in systems and cloud services until several months from now, the GPU will be available in the second-generation Grace Hopper Superchip, the GH200, much sooner.
Nvidia said the GH200 is available now in early access from two cloud service providers, Lambda and Vultr. Other cloud companies expected to offer GH200-based instances include Oracle Cloud Infrastructure and CoreWeave, with the latter set to open instances in the first quarter of 2024.
As for server vendors, ASRock Rack, Asus, Gigabyte and Ingrasys plan to start shipping GH200-based systems by the end of the year.
The company said the Grace Hopper Superchip family has been adopted through early-access programs by more than 100 enterprises, organizations and government agencies so far, including the NASA Ames Research Center and global energy company TotalEnergies.