AMD Eyes AI, Cloud Expansion With Instinct MI300X, EPYC 97X4 Chips
The chip designer claims its upcoming Instinct MI300X GPU, which comes with an unprecedented 192GB of HBM3 memory, will allow organizations to run large language models on fewer GPUs compared to Nvidia’s flagship H100, resulting in a lower total cost of ownership.
AMD is swinging big for the AI, cloud and technical computing markets with fresh lineups of Instinct chips and EPYC CPUs that will help it compete against the likes of Nvidia, Intel and others.
The Santa Clara, Calif.-based chip designer went all out with superlatives for new and existing chips at its Data Center and AI Technology Premiere event in San Francisco Tuesday.
“Today we lead the industry with our EPYC processors, which are the highest-performing processors available,” AMD CEO Lisa Su said at the event’s keynote. “We also offer the industry’s broadest portfolio, which allows us to really optimize for the different workloads in the data center.”
The superlatives continued from there.
The new Instinct 300X, for instance, is what AMD called the “world’s most advanced accelerator for generative AI.” The chip comes with up to 192GB of high-bandwidth HBM3 memory and uses AMD’s next-generation CDNA 3 architectures, allowing it to “provide the compute and memory efficiency needed for large language model inference and generative AI workloads.”
In the cloud market, AMD claimed its fourth-gen EPYC 97X4 processors, previously code-named Bergamo, will provide better virtual CPU density than Intel or Ampere Computing, a cloud chip startup, thanks to the top CPU’s dual-threaded 128 Zen 4c cores enabling 512 virtual CPUs in a dual-socket server. It will also have “industry leading performance” and “leadership energy efficiency.”
For technical computing, the chip designer said its new fourth-gen EPYC processors with AMD 3D V-Cache technology have the “world’s highest performance” for the workload category. This is because of the CPU’s maximum of 96 Zen 4 cores combined with a massive L3 cache of more than 1GB made possible by AMD’s 3D chip packaging technology.
AMD debuted the chips with support from executives at Amazon Web Services, Facebook parent company Meta and Microsoft Azure, who joined the company on stage to announce plans to support the latest EPYC processors with new and existing products and services.
The company’s stock price was down around 3.6 percent after the keynote concluded.
AMD Makes Biggest Change Yet To Zen For The Cloud
AMD introduced the new chips after making significant share gains against Intel in the x86 server CPU market over the past several years, growing from 0.8 percent at the end of 2017 to 34.6 percent in the second quarter 2023, according to figures from CPU tracker Mercury Research.
This data center CPU land grab, complemented by gains in the PC market, has fueled major revenue growth for AMD over the past several years and made it a darling on Wall Street, where the company’s stock price has grown more than 660 percent since mid-2018.
The chip designer owes much of this success to the Zen CPU architecture that has received multiple updates since the company departed from previous design methodologies with its debut in 2017.
On Tuesday, the chip designer revealed the biggest design departure in its Zen architecture yet with the cloud-focused EPYC 97X4 CPUs, code-named Bergamo. These processors use an offshoot of AMD’s Zen 4 architecture, which debuted in the data center last year with the general-purpose fourth-gen EPYC CPUs.
The architecture spin-off is called Zen 4c, which AMD designed to optimize for performance-per-watt versus Zen 4’s focus on performance-per-core by prioritizing the power and area of the core, Su said.
This allowed the company to make the Zen 4c core 35 percent smaller than the Zen 4 core, and it let AMD fit 16 Zen 4c cores on a core complex die, a key building block of its chiplet-based design approach. In turn, AMD was able to fit up to eight core complex dies on the EPYC 97X4 CPUs. By contrast, a vanilla fourth-gen EPYC can hold up to 12 core complex dies, each with eight Zen 4 cores.
As a result, AMD was able to provide a higher number of cores in its new EPYC 97X4 CPUs, which cap at 128 cores, compared to the 96-core limit of the regular EPYC 9004 processors.
Su said AMD designed the cloud-optimized EPYC chips with higher core design and greater energy efficiency because those two attributes matter the most to cloud service providers.
“The optimum design point for these processors are actually different than general-purpose computing,” she said. “They’re actually very throughput-oriented and they benefit from the highest density and the best energy efficiency. So all of these factors actually drove the development of Bergamo.”
AMD said these optimizations allow the EPYC 97X4 CPUs to run cloud-native applications faster than Intel’s latest Xeon Scalable processors, previously known as Sapphire Rapids. According to benchmarks run by AMD, the company’s 128-core EPYC 9754 can run the NGINX web frontend 2.6 times faster than Intel’s 60-core Xeon Platinum 8490H. It can also run the Redis in-memory database 60 percent faster and a MySQL backend database 2.1 times faster.
The chip designer also claimed that its top cloud CPU can run 2.1 times more containers per server and provide 2 times better efficiency for Java applications than Intel’s Xeon Platinum 8490H.
Because Zen 4c is based on the Zen 4 design, Su said the EPYC 97X4 CPUs are 100 percent software compatible with the general-purpose EPYC 9004 processors. They are also socket-compatible with last year’s data center chips, which means customers can switch out existing chips for new ones.
Su said Bergamo chips are shipping to hyperscale customers now.
With the Zen 4c core, AMD is looking to build on growing support it has received in the cloud market from the likes of Amazon Web Services, Microsoft Azure, Google Cloud and others.
“Every major cloud provider has deployed EPYC for their internal workloads as well as their customer-facing instances. Today, there are more than 640 EPYC instances available globally with another 200 on track to launch by the end of the year,” Su said.
In looking to grow its cloud foorprint, AMD is looking to compete not just with Intel. It’s also trying to fight against the growing trend of cloud service providers turning to Arm-based processors for high density and better efficiency, two of the most desired attributes for cloud computing.
Most famously, AWS is now three generations into its homegrown CPU, Graviton. Microsoft Azure and Google Cloud are reportedly designing its custom Arm-based processors too. There are also a growing number of cloud providers, including Microsoft and Google, who are starting to use Arm-based chips from Ampere Computing, a startup founded by a former Intel executive.
Intel also plans to better compete with Arm-based chips with its own cloud-optimized Xeon chips, code-named Sierra Forest, will sport up to 144 of its efficiency cores. Those processors are due in the first half of 2024, according to an Intel road map update from March.
AMD Seeks To Up AI Game Against Nvidia
While AMD has made significant headway with server CPUs, it’s still trying to play catch-up in the AI computing space with GPUs designed to accelerate such workloads. To date, GPUs have not been a significant contributor to AMD’s data center revenue.
At Tuesday’s event, the chip designer revealed the second chip to come from its Instinct MI300 series: the MI300X, which AMD said provides better efficiency and cost savings for running large language models—a type of AI model that has exploded in demand thanks to ChatGPT—than Nvidia’s flagship H100 data center GPU that fuels popular AI workloads today.
This improved efficiency and cost savings is made possible by the MI300X’s 192GB of maximum HBM3 memory, which is 2.4 times bigger than the 80GB HBM3 capacity of Nvidia’s H100, according to Su. The chip also has a memory bandwidth of 5.2 TB/s, which is 60 percent faster than the H100.
Su said this larger memory capacity allows users fit entire large language models such as the 40 billion parameter Falcon-40B—one of the most popular models on the widely used machine learning platform Hugging Face—on a single MI300X. But she added that the GPU can handle models roughly double the size when using half-precision floating point, also known as FP16.
“It’s the first time a large language model of this size can be run entirely in memory on a single GPU,” Su said as AMD demonstrated a live demo of the MI300X running inference on Falcon-40B.
The implication of fitting more AI models on a single GPU, according to Su, is that organizations will require fewer GPUs with the MI300 to perform the same amount of work as Nvidia’s H100. This, in turn, means organizations don’t have to spend as much money for their AI infrastructure, for which costs can be exorbitantly high with large language models.
“What this means for cloud providers as well as enterprise users is we can run more inference jobs per GPU than we could before. And what that enables is for us to deploy MI300X at scale to power next-gen LLMs with lower total cost of ownership, really making the technology much more accessible to the broader ecosystem,” Su said.
To drive adoption of the new data center GPU, AMD has created what it calls the Infinity Architecture Platform, an “industry-standard” server reference design that uses eight MI300X GPUs for generative AI training and inference workloads.
Su said the Infinity Platform is based on specifications from the Open Compute Project, an organization founded by Facebook owner Meta that determines standard server designs for hyperscalers and other companies with massive data centers.
This means hyperscalers can use the Infinity Platform to put MI300X GPUs in existing OCP server racks, which Su said will help drive faster adoption.
“We’re actually accelerating customers’ time to market and reducing overall development costs, while making it really easy to deploy the MI300X into their existing AI ramp and server construction,” she said.
AMD revealed the first chip in the Instinct MI300 series earlier this year with the MI300A, which combines CDNA 3 cores, Zen 4 cores and memory in a single integrated design for AI and high-performance computing workloads.
Su said customers are sampling the MI300A now while it plans to start offering samples of the MI300X to customers in the third quarter of this year. Both chips expected to enter production in the fourth quarter.
When it comes to software support for the company’s growing portfolio of accelerator chips, AMD President Victor Peng said it has made “really great progress in building a powerful software stack that works with the open ecosystem of models, libraries, frameworks, and tools.”
“Our platforms are proven in deployments today” such as the U.S Department of Energy’s Frontier exascale supercomputer, Peng said. “And the demand for our platform is simply gaining tremendous momentum, and we are ready to meet that.”
AMD Debuts Second Round Of EPYC For Technical Computing
Continuing its focus on technical computing, AMD debuted the second round of EPYC processors, code-named Genoa-X, to come with the company’s 3D V-Cache technology, which significantly expands the CPU’s L3 cache to handle data-intensive workloads.
This is another expansion of AMD’s fourth-gen EPYC processors that debuted last year, and it’s the massive 1.1GB of L3 cache in the new processors that make them a better fit for various technical computing workloads such as electronic design automation and computational fluid dynamics.
The chip designer said the CPU’s large cache plays a big role in what makes it the “world’s highest performance x86 server CPU for technical computing.” The new fourth-gen EPYC chips with 3D V-Cache uses the same Zen 4 core as the vanilla processors from last year.
Based on tests run by AMD, the lineup’s flagship CPU, the 96-core EPYC 9648X, can run roughly 2.9 times faster in Ansys Fluent, about 2.6 times faster in Ansys CFX, and approximately 2.2 times faster in OpenFOAM compared to Intel’s 60-core Xeon Platinum 8490H.
“Genoa-X helps unlock the potential of the world’s most important and demanding workloads in technical computing,” said Dan McNamara, head of AMD’s server business unit.
McNamara said servers with Genoa-X will be available in the third quarter.