Next-Gen AMD EPYC, Radeon To Power 2-Exaflop Supercomputer

'Based on the performance that we expect the AMD processors to deliver to our actual workload, our decision was that they would provide by far the best value to the government,' a CTO at the U.S Department of Energy says of the El Capitan supercomputer that will go online in 2023.

ARTICLE TITLE HERE

While AMD is only six months into the launch of its second-generation EPYC Rome server processors, the chipmaker is announcing today that its fourth-generation EPYC "Genoa" processors will power the U.S. Department of Energy's El Capitan supercomputer in 2023.

The Santa Clara, Calif.-based company announced the news with Hewlett Packard Enterprise, the supercomputer's system vendor, on Monday with the DOE's Lawrence Livermore National Laboratory in California, which plans to use the $600 million supercomputer to support the nation's nuclear security missions.

[Related: Top Intel AI Exec Naveen Rao Departs After Nervana Pivot]

id
unit-1659132512259
type
Sponsored post

The supercomputer is expected to deliver two exaflops of computers, or two quintillion mathematical calculations per second, making it 10 times faster than the world's fastest supercomputer currently in use and more powerful than the world's top 200 fastest supercomputers combined. If every human on Earth were tasked with computing the same number of tasks per second, it would take the entire population a total of eight years what El Capitan can do in one second.

El Capitan will use AMD's fourth-generation EPYC Genoa processors based on the chipmaker's Zen 4 architecture and a next-generation Radeon Instinct based on a "new compute-optimized architecture." The supercomputer will also use AMD's third-generation Infinity Fabric, which will enable a high-bandwidth, low latency connection between the CPUs and GPUs.

The processors are being outfitted in HPE's Cray Shasta servers, which will use the HPC vendor's Slingshot interconnect. HPE acquired Cray for $1.3 billion last year as part of its push into the HPC and AI.

Forrest Norrod, the head of AMD's Datacenter and Embedded Systems Group, said the EPYC Genoa processors will have single/multi-core and heterogenous compute leadership and next-generation memory and throughput subsystems meant for high-performance computing and artificial intelligence. The Radeon Instinct, in the meantime, will have next-generation high bandwidth memory and support multi-GPU scaling for maximum performance.

"The exciting thing about El Capitan is it allows us to showcase AMD CPU and AMD GPU technology working together hand in hand," he said.

Critical to enabling that heterogenous compute architecture is AMD's third-generation Infinity Fabric, which will provide unified memory across the CPU and GPU, according to Norrod. With the support of memory coherence, the interconnect technology will improve performance and simplify programming.

"It's evolutionary over what we're doing today," Norrod said.

Bronis R. de Supinski, CTO for the lab's Livermore Computing division, said his organization decided to go with AMD over competing vendors based on the performance the AMD has promised for El Capitan's completion date in 2023.

"Based on the performance that we expect the AMD processors to deliver to our actual workload, our decision was that they would provide by far the best value to the government," he said.

When it goes online, El Capitan will conduct demanding simulations to "certify the nuclear stockpile is safe, secure and reliable," according to de Supinski.

"We require very complex simulations, and as the nuclear stockpile ages, the complexity of the simulations only increases, so we need to be able to use larger and larger systems to maintain the level of assurance the nation really needs," he said.

DOE’s plan to use AMD processors for its El Capitan supercomputer comes after the chipmaker announced last May that it would supply next-generation EPYC processors and Radeon Instinct GPUs for the DOE's Frontier supercomputer, which is one of two of the first exascale systems expected to go online in the U.S., with a promised performance of 1.5 exaflops.