Components peripherals News

How Penguin Computing Helped Design Meta’s New AI Supercomputer

Dylan Martin

Thierry Pellegrino, the executive who oversees Penguin Computing at parent company Smart Global Holdings, talks to CRN about how the system builder designed and integrated the new AI supercomputer for Facebook parent company Meta.

What It Takes to Build The World’s ‘Fastest AI Supercomputer’

Any company with the same level of financial resources as Facebook parent company Meta could buy the same parts and systems Meta is using for its new AI supercomputer.

But to connect hundreds — and eventually thousands — of Nvidia’s DGX A100 systems together along with storage systems, it takes the right kind of system integrator to design and construct a high-performance computing cluster that meets the company’s requirements.

[Related: Lisa Su On How AMD Is Building ‘Workload-Optimized’ CPUs ]

In the case of Meta’s new AI Research SuperCluster, the Menlo Park, Calif.-based tech giant turned to Penguin Computing to design and integrate the new AI supercomputer, which Meta said will become the world’s fastest later this year when the full cluster of 16,000 Nvidia A100 GPUs goes online.

In an interview with CRN, Thierry Pellegrino, the executive who oversees Fremont, Calif.-based Penguin Computing at parent company Smart Global Holdings, said the new AI supercomputer was the result of an “embedded collaboration” between Meta and Penguin Computing that began more than four years ago with a previous-generation AI supercomputer that used Nvidia’s older V100 GPUs.

“When you operate such a cluster, you have to understand the requirements around security, around performance, around availability and around all the unique aspects of a company like Meta that need to be taken into consideration. That’s been our value to Meta,” said Pellegrino, whose title is president and senior vice president of Smart Global Holdings’ Intelligent Platform Solutions.

In his interview with CRN, Pellegrino, who previously ran Dell EMC’s HPC business, talked about how Penguin Computing designed and integrated Meta’s AI Research SuperCluster, which includes storage systems made by Pure Storage and Penguin Computing itself. He also discussed how the company overcame storage and networking bottlenecks and whether the new AI supercomputer is a sign of things to come in the larger enterprise world. The transcript was lightly edited for clarity.

Learn More: CPUs-GPUs AI