Intel, HPE Load Aurora Supercomputer With Over 63,000 GPUs, 21,000 CPUs

The semiconductor giant, working with Hewlett Packard Enterprise, has completed the installation of more than 10,000 compute blades in the U.S. Department of Energy’s long-delayed Aurora supercomputer, which could become the world’s fastest with Intel’s latest CPUs and GPUs when it is expected go online later this year.

ARTICLE TITLE HERE

The U.S. Department of Energy’s long-delayed Aurora supercomputer is one step closer to going online and potentially becoming the world’s fastest after Intel and Hewlett Packard Enterprise finished installing more than 10,000 compute blades packed with the chipmaker’s latest data center CPUs and GPUs.

Intel announced the milestone Thursday, saying that Aurora’s 166 racks at the DOE’s Argonne National Lab are now loaded with 10,624 server blades, each of which weigh 70 pounds and contain two Intel Xeon CPU Max Series and six Intel Data Center GPU Max Series processors. This amounts to a total of 21,248 Intel Xeon CPUs and 63,744 Intel data center GPUs installed in a space that is equivalent to two professional basketball courts, according to Intel.

unit-1659132512259

type

Aurora’s Performance Expectations

The chipmaker said Aurora, originally unveiled in 2015 before suffering a series of delays and a major architectural shift from Intel, is “expected to be the world’s first supercomputer to achieve a theoretical peak performance of more than 2 exaflops.” That amounts to more than 2 quintillion, or more than 2 billion billion, floating point operations per second.

Intel said it expects Aurora to hit this milestone later this year when Argonne submits performance data for Aurora to Top500, the organization that ranks the world’s fastest supercomputers. If verified by Top500, it would make the supercomputer the fastest in the world, surpassing the horsepower of the first publicly verified exascale system, AMD-powered Frontier at the DOE’s Oak Ridge National Laboratory.

Based on recent testing on Argonne’s Sunspot testbed system for Aurora, the lab found that Intel’s Max Series GPUs demonstrated up to two times the performance of AMD’s Instinct MI250X GPUs on the OpenMC Monte Carlo neutron and photon transport simulation code, according to the chipmaker.

Intel said it ran its own tests and found that, on average, the Max Series CPUs provides a 40 percent performance boost over AMD’s fourth-generation EPYC processors, previously code-named Genoa, across “many real-world workloads, such as earth systems modeling, energy and manufacturing.”

What Are Intel’s Max Series Chips?

Intel launched the Max Series CPUs and Max Series GPUs earlier this year in its latest effort to fight back against the growing influence of rivals AMD and Nvidia in the HPC space.

The Max Series CPUs use the same Sapphire Rapids microarchitecture as Intel’s fourth-generation Xeon Scalable processors, but the biggest difference is that the Max chips each come with up to 64 GB of high-bandwidth HBM2e memory. This gives a dual-socket server such as an individual Aurora compute blade a total of 128 GB of HBM2e to handle large data sets close to the CPU.

The Max Series GPUs, on the other hand, are Intel’s first data center GPUs focused on workloads at the convergence of HPC and AI. Each GPU consists of more than 100 billion transistors on a single package, and it sports up to 128 of Intel’s Xe cores, each of which come with Intel Xe Matrix Extensions for accelerating AI workloads as well as enabling vector and matrix math capabilities in a single device. The GPU also comes with up to 128 ray tracing units to simulate realistic lighting in real time.

Aurora’s Planned Uses And Other Specs

Argonne plans to use the combined horsepower of Intel’s CPUs and GPUs for a variety of causes, “from tackling climate change to finding cures for deadly diseases,” according to the chipmaker. The lab also plans to make use of generative AI models to accelerate research efforts.

Rick Stevens, associate lab director at Argonne, said Aurora’s more than 60,000 Max CPUs, combined with a “very fast I/O system and an all solid-state mass storage system, will create the “perfect environment to train large-scale, open-source generative AI models for science.”

Aurora, which is based on HPE’s Cray EX supercomputer design, comes with more than 1,024 storage nodes that use Intel’s distributed asynchronous object storage, also known as DAOS. This will give the system more than 220 PB of capacity at 32 TBps of total bandwidth.

Each compute node has a unified memory architecture between the two Intel CPUs and six Intel GPUs, and that memory capacity includes 512 GB of DDR5 in addition to the 128 GB of HBM2e per node. For a system interconnect, Aurora uses Slingshot 11 from HPE’s Cray unit.

Aurora Almost Online After Years Of Delays

With plans to go online later this year, the Aurora supercomputer is almost here after significant delays. Many of these delays were tied to larger design and manufacturing challenges faced by Intel over the past several years as it fell behind Asian contract chip manufacturers TSMC and Samsung along with the chip designers those foundries enabled, such as AMD and Nvidia, for next-generation chip technologies.

Aurora was first unveiled in 2015 when supercomputer vendor Cray, several years before its acquisition by HPE, landed a deal to build the system. At the time, the original timeline for completion was in 2018, but that was pushed back to 2021 when Intel halted work on an earlier effort to develop an HPC accelerator chip, code-named Knights Hill, as part of a larger family of now-discontinued processors called Xeon Phi.

When Intel ended work on Knights Hill in 2017, it decided to begin work on what eventually became known as the Xe GPU architecture that serves as the foundation for its discrete GPU products that debuted in 2020 with a low-end model for laptops.

Then, in mid-2020, Intel said it was experiencing manufacturing issues that delayed by 12 months the rollout of its 7-nanometer manufacturing process, which the company was planning to use for what became its Max Series CPUs and GPUs. This caused Aurora’s target date to move to 2022. The timeline slipped further, beyond 2022, as Intel delayed the rollout of its Sapphire Rapids CPUs again, for the purpose of giving the chips additional “platform and product validation time.”

It’s now roughly five years after Aurora’s original completion date, and the supercomputer is fully installed with Intel’s CPUs and GPUs, making it nearly ready for researchers to finally tap into.

Broader Intel GPU Adoption Could Take Time, Partner Says

An engineering executive at an HPC systems integrator told CRN that while it’s good for Intel to finally complete its part for Aurora, it may take some time until the chipmaker’s GPUs filter down to smaller cluster projects at research universities and commercial organizations such as manufacturers.

“It takes sometimes years before it filters down to the folks that we deal with,” said Dominic Daninger, vice president of engineering at Burnsville, Minn.-based Nor-Tech.

For now, most of Nor-Tech’s customers are seeking out GPUs from Nvidia, according to Daninger.

“We’re just seeing a lot of demand for that,” he said.

Daninger added that Nor-Tech is starting to see some demand for AMD’s GPUs too.

If Intel wants to find broader adoption of its GPUs with HPC users, Daninger said, the company should focus on pushing GPU clusters at research universities.

“It never hurts to get those into the universities. Nvidia is certainly doing that. They got pretty significant education discounts on all their products,” he said. “Then you get those graduates. They get out of there and get out in industry and start influencing what the buyers will do. That’s often the path it takes.”