How CSPs Can Scale AI Deployments With Supermicro And AMD

CSPs face the challenge of scaling their AI operations across thousands of GPUs while ensuring everything remains reliable, secure and compliant. Achieving this requires robust hardware and software solutions, combined with a deep understanding of how to optimize and manage these resources effectively.

Sachin Hindupur from AMD and Ben Lee from Supermicro discussed with Sydney Neely of The Channel Company how CSPs can confidently and securely deploy AI models on a massive scale.

Ben, what do you predict will be the biggest challenges for cloud service providers in the future, especially in terms of managing the increasing demands of AI and high-performance computing workloads?

I think the biggest challenge for CSP in the future will likely be around scalability, efficiency and future-proofing their data infrastructures to handle the increased demand of AI and HPC workloads. Supermicro is always focused on advancing our offerings for what our customers need now.

In the second half of the year, we will have our next-gen, greatest and latest GPU solution, which will be liquid cooling ready. The CDU will double the capacity to up to 200kW to accommodate these high performing GPUs. We will continue to work with our partners like AMD to enhance these features, so it will be a lot of good news for CSPs in the coming years.

How does AMD meet CSP customers' needs?

If you look at what are some of the top needs of CSP's customers, they need to be able to support a wide variety of workloads from SMBs all the way to large enterprises. They also need to have high performance and high energy efficiency, meeting the sustainability goals and all at a low TCO.

id
unit-1659132512259
type
Sponsored post

I'll take you through the variety of workloads, which essentially accounts for close to 90 percent of all workloads:

Starting with general purpose workloads, we have the Genoa family of processors. These have up to 96 cores, around up to 384 MB of L3 cache, providing high performance on a per socket and a per core basis and high energy efficiency and low TCO.

Next, if you look at HPC and technical compute workloads, these are essentially workloads for CAD, CAM, CAE, computational fluid dynamics, simulation, and more. In this case, you have the Genoa X line of processors, which also has up to 96 cores. More importantly, it has a 3D vCache of 1152 MB, which is needed for getting the most performance for this kind of workload.

Then you have the infrastructure workloads, like virtualization and essentially cloud native workloads. Here you have the Bergamo line of processors, which is up to 128 cores, and it provides the highest performance, highest density and high energy efficiency.

You also have telco and edge workloads, which have unique needs such as space constraints and power constraints. We have the Sienna line of processors, which is up to 64 cores with six channel DDR5 and a low power envelope of 70 watts TDP.

In summary, it's through a combination of multiple processors that we meet the variety of workloads and goals of our customers.

If you’d like to read more about AMD and Supermicro’s partnership, you can visit our Performance Intensive Computing.