GenAI Boost: AWS To Get Nvidia DGX Cloud Service, EC2 Instance Powered By 32 Grace Hopper Chips

Amazon Web Services will be the first cloud service provider to use the Grace Hopper Superchip, Nvidia’s most advanced AI chip yet. The vendor plans to launch an EC2 instance powered by 32 Grace Hopper chips, each of which combine a CPU and GPU, in a single server rack to handle generative AI applications.

ARTICLE TITLE HERE

Amazon Web Services plans to provide Nvidia’s DGX Cloud AI supercomputing service and a new E2 instance powered by 32 of the semiconductor giant’s Grace Hopper Superchips to process the development and deployment of large generative AI applications.

The plan was announced on stage by AWS CEO Adam Selipsky and Nvidia CEO Jensen Huang at the cloud service provider’s re:Invent 2023 event in Las Vegas Tuesday as part of an expanded collaboration between the two tech juggernauts, which also includes an AWS-hosted AI supercomputer coming online next year for Nvidia’s own research and development team.

Both DGX Cloud and the Grace Hopper-powered EC2 instances will become available on AWS next year.

Called a “cloud within the cloud” by one of Huang’s lieutenants, DGX Cloud is Nvidia’s new AI supercomputing service that comes with all the software elements needed to develop and deploy AI applications in the cloud, including the Nvidia AI Enterprise software suite.

“It’s the place where Nvidia directly engages with enterprises around the world to build custom AI models for their businesses. Customers can fine-tune, train, optimize their models in DGX Cloud and then deploy those containerized models and services anywhere,” said Ian Buck, Nvidia’s vice president of hyperscale and high-performance computing, in a briefing with journalists.

David Brown, vice president of compute and networking services at AWS, said the upcoming arrival of DGX Cloud to AWS is part of a “much deeper collaboration” that “brings together the best” products and services from both vendors to the “ultimate joint end customer.”

“It’s been great to see the teams working together and really working out how to bring the best of both offerings to our end customers,” he said.

The announcement was made as Nvidia, whose revenue tripled in its most recent quarter, seeks to continue its dominance in the AI computing space and cement itself as a “full-stack computing company” with a growing portfolio of chips, networking components, systems, software and services.

At the same time, AWS hopes to build upon its position as the world’s biggest cloud service provider to become a global superpower in the AI computing space with an expanding portfolio of generative AI services and a new generation of its Trainium AI chip.

AWS Is First Cloud Provider To Use GH200 For DGX Cloud

While Nvidia has announced DGX Cloud for other cloud service providers like Microsoft Azure and Oracle Cloud, the version coming to AWS is “unique” because it will be the first time the service has been implemented using Nvidia’s Grace Hopper Superchip, its most advanced AI chip yet.

Grace Hopper is a new kind of AI chip developed by Nvidia that brings together an Arm-based CPU and GPU in an integrated module, a major contrast to how the company traditionally designs discrete GPUs that are paired with x86 CPUs from Intel or AMD in a server.

To date, Azure, Oracle Cloud and other cloud service providers have offered DGX Cloud in tandem with Nvidia’s discrete A100 and H100 GPUs.

AWS’ implementation of DGX Cloud will use the second iteration of the chip, called GH200, which combines Nvidia’s 72-core Grace GPU with 480 GB of LPDDR5X memory and its H200 GPU with 141 GB of HBM3e high-bandwidth memory

The cloud service provider will provide access to 32 GH200 chips as part of a single EC2 instance, thanks to a new “rack-scale” GPU architecture called the GH200 NVL32 multi-node platform that combines as many chips in one liquid-cooled server rack.

AWS will also use the GH200 NVL32 platform for its iteration of DGX Cloud.

“It is a new rack-scale GPU architecture for the era of generative AI,” Buck said.

With all 32 GH200 chips in a single rack connected with Nvidia’s NVLink interconnect technology, each rack will provide 128 petaflops of AI performance—or 1 quadrillion floating-point operations per second—and 20 TB of shared memory, the largest memory capacity yet for an AWS instance.

Compared with an eight-GPU server rack with the same GPU, the GH200 NVL32 can reduce the cost to train so-called mixture-of-experts AI models with trillions of parameters by two times and double the inference performance for large language models, according to Buck.

The Nvidia executive said this is possible because the rack design allows the system to split large AI models between the GPUs within each of the 32 GH200 chips. With NVLink enabling 57.6 TBps of chip-to-chip bandwidth, the rack essentially acts like one big GPU.

“This dramatically reduces the time to run a single inference query, making even the largest neural networks, the trillion-parameter models, the mixture-of-experts models run in real time,” he said.

For customers who want to work with more than 32 GH200 chips, they can scale up to thousands of the chips using AWS’ EC2 UltraClusters, enabled by the vendor’s Elastic Fabric Adapter interconnect.

The GH200-powered instances will also make use of AWS’ Nitro System, which offloads virtualization functions from the CPU to dedicated hardware.

‘World’s Fastest AI Supercomputer’ Hosted By AWS, For Nvidia’s Use Only

As part of AWS’ expanded collaboration with Nvidia, the two vendors said they plan to build the “world’s largest AI supercomputer,” though it will only be used by the semiconductor company’s research and development team.

Called Project Ceiba, the supercomputer will use the GH200 NVL32 rack architecture and feature a total of 16,384 GH200 chips. This will enable the system to provide 64 exaflops, or 1 quintillion floating-point operations per second, of AI performance and 9.5 PB of total memory, according to Nvidia.

“By our calculations, it will be the world’s fastest AI supercomputer once assembled,” Buck said.