The Biggest AMD Advancing AI News: From MI500 GPU TO ROCm Enterprise AI
At its Advancing AI event, AMD reveals several details for the Instinct MI350 series GPUs and their corresponding rack-scale systems coming later this year as well as the MI400 processors that will power double-wide racks to go against Nvidia’s Vera Rubin platform next year.
AMD on Thursday revealed Instinct GPUs and rack-scale AI systems that the company said will make it increasingly competitive against Nvidia.
At its Advancing AI event, the chip designer revealed several details for the MI350 series GPUs and their corresponding rack-scale systems coming later this year as well as the MI400 processors that will power double-wide racks to go against Nvidia’s Vera Rubin platform next year.
[Related: 9 AMD Acquisitions Fueling Its AI Rivalry With Nvidia]
AMD also gave a teaser for the MI500 series GPUs and associated products that will go against Nvidia’s Rubin Ultra platform in 2027.
During her keynote, AMD CEO Lisa Su said seven of the 10 largest AI companies now use Instinct GPUs, including OpenAI, Meta and xAI. She also cited large-scale cloud and internal deployments of Instinct with Microsoft, Oracle and others.
“With the MI350 series, we’re delivering the largest generational performance leap in the history of instinct, and we’re already deep in development of MI400 for 2026 that is really designed from the grounds up as a rack-level solution,” she said.
The Santa Clara, Calif.-based company has made significant investments to develop data center GPUs and associated products that can go head-to-head with the fastest chips and systems from Nvidia, which earned more than double the revenue of AMD and Intel combined for the first quarter, according to a recent CRN analysis.
What follows is a roundup of the new details AMD shared about its Instinct MI350X, MI355X, MI400 and MI500 GPUs as well as the company’s upcoming ROCm Enterprise AI and distributed inference software releases.
Instinct MI350X, MI355X
AMD said its Instinct MI350 series GPUs provide greater memory capacity and better or similar performance than Nvidia’s fastest Blackwell-based chips, which the company called “significantly more expensive.”
Set to launch in the third quarter, the MI355X and MI350X GPUs will be supported by two dozen OEM and cloud partners, including Dell Technologies, Hewlett Packard Enterprise, Cisco Systems, Oracle and Supermicro. The company said it plans to announce more partners in the future.
“The list continues to grow as we continue to bring on significant new partners,” said Andrew Dieckmann, corporate vice president and general manager of data center GPU at AMD, in a Wednesday briefing with journalists.
Featuring 185 billion transistors, the MI350 series is built using TSMC’s 3-nanometer manufacturing process.
The MI355X and MI350X both feature 288 GB of HBM3e memory, which is higher than the 256-GB capacity of its MI325X and roughly 60 percent higher than the capacity of Nvidia’s B200 GPU and GB200 Superchip, according to AMD. The company said this allows each GPU to support an AI model with up to 520 billion parameters on a single chip. The memory bandwidth for both GPUs is 8 TBps, which it said is the same as the B200 and GB200.
The MI355X—which has a thermal design power of up to 1,400 watts and is targeted for liquid-cooled servers—can provide up to 20 petaflops of peak 6-bit floating point (FP6) and 4-bit floating point (FP4) performance.
AMD claimed that the FP6 performance is two times higher than what is possible with the GB200 and more than double that of the B200. FP4 performance, on the other hand, is the same as the GB200 and 10 percent faster than the B200, according to the company.
The MI355X can also perform 10 petaflops of peak 8-bit floating point (FP8), which AMD said is on par with the GB200 but 10 percent faster than the B200; five petaflops of peak 16-bit floating point (FP16), which it said is on par with the GB200 but 10 percent faster than the B200; and 79 teraflops of 64-bit floating point (FP64), which it said is double that of the GB200 and B200.
The MI350X, on the other hand, can perform up to 18.4 petaflops of peak FP4 and FP6 math with a thermal design power of up to 1,000 watts, which the company said makes it suited for both air- and liquid-cooled servers.
AMD said the MI355X “delivers the highest inference throughput” for large models, with the GPU providing roughly 20 percent better performance for the DeepSeek R1 model and approximately 30 percent better performance for a 405-billion-parameter Llama 3.1 model than the B200.
Compared to the GB200, the company said the MI355X is on par when it comes to the same 405-billion-paramater Llama 3.1 model.
Dieckmann noted that AMD achieved the performance advantages using open-source frameworks like SGLang and vLLM in contrast to Nvidia’s TensorRT-LLM framework, which he claimed to be proprietary even though the rival has published it on GitHub under the open-source Apache License 2.0.
The MI355X’s inference advantage over the B200 allows the GPU to provide up to 40 percent more tokens per dollar, which Dieckmann called a “key value proposition” against Nvidia.
“We have very strong performance at economically advantaged pricing with our customers, which delivers a significantly cheaper cost of inferencing,” he said.
As for training, AMD said the MI355X is roughly on par with the B200 for 70-billion-parameter and 8-billion-parameter Llama 3 models.
But for fine-tuning, the company said the GPU is roughly 10 percent faster than the B200 for the 70-billion-parameter Llama 2 model and approximately 13 percent faster than the GB200 for the same model.
Compared to the MI300X that debuted in late 2023, the company said the MI355X has significantly higher performance across a broad range of AI inference use cases, running 4.2 times faster for an AI agent and chat bot, 2.9 times faster for content generation, 3.8 times faster for summarization and 2.6 times faster for conversational AI—all based on a 405-billion-paramater Llama 3.1 model.
AMD claimed that the MI355X is also roughly 3 times faster for the DeepSeek R1 model, 3.2 times faster for the 70-billion-parameter Llama 3.3 model and 3.3 times faster for the Llama 4 Maverick model—all for inference.
The company said the MI350 GPUs will be paired with its fifth-generation EPYC CPUs and Pollara NICs for rack-scale solutions.
The highest-performance rack solution will require direct liquid cooling and consist of 128 MI355X GPUs and 36 TB of HBM3e memory to perform 2.6 exaflops of FP4. A second liquid-cooled option will consist of 96 GPUs and 27 TB of HBM3e to perform 2 exaflops of FP4.
The air-cooled rack solution, on the other hand, will consist of 64 MI350X GPUs and 18 TB of HBM3e to perform 1.2 exaflops of FP4.
AMD did not say during the Wednesday briefing how much power these rack solutions will require.
Instinct MI400 Series
AMD revealed that its Instinct MI400-based, double-wide rack scale solutions coming next year will provide 50 percent more memory capacity and bandwidth than Nvidia’s upcoming Vera Rubin platform while offering roughly the same AI compute performance.
Andrew Dieckman, corporate vice president and general manager of data center GPU at AMD, said in the Thursday briefing that the MI400 series “will be the highest-performing AI accelerator in 2026 for both large-scale training as well as inferencing.”
“It will bring leadership performance due to the memory capacity, memory bandwidth, scale-up bandwidth and scale up bandwidth advantages,” he said of the MI400 series and the double-wide “Helios” AI server rack it will power.
Set to launch in 2026, AMD said the MI400 GPU series will be capable of performing 40 petaflops of 4-bit floating point (FP4) and 20 petaflops of 8-bit floating point (FP8), double that of the flagship MI355X landing this year.
Compared to the MI350 series, the MI400 series will increase memory capacity to 432 GB based on the HBM4 standard, which will give it a memory bandwidth of 19.6 TBps, more than double that of the previous generation. The MI400 series will also sport a scale-out bandwidth capacity of 300 GBps per GPU.
AMD plans to pair the MI400 series with its next-generation EPYC “Venice” CPU and Pensando “Vulcano” NIC to power the Helios AI rack.
The Helios rack will consist of 72 MI400 GPUs, giving it 31 TB of HBM4 memory capacity, 1.4 PBps of memory bandwidth and 260 TBps of scale-up bandwidth. This will make it capable of 2.9 exaflops of FP4 and 1.4 exaflops of FP8. The rack will also have a scale-out bandwidth of 43 TBps.
Compared to Nvidia’s Vera Rubin platform that is set to launch next year, AMD said the Helios rack will come with the same number of GPUs and scale-up bandwidth plus roughly the same FP4 and FP8 performance.
At the same time, the company said Helios will offer 50 percent greater HBM4 memory capacity, memory bandwidth and scale-out bandwidth.
In a rendering of the Helios, the rack appeared to be wider than Nvidia’s rack-scale solutions such as the GB200 NVL72 platform.
Dieckmann said Helio is a double-wide rack, which AMD and its key partners felt was the “right design point between complexity [and] reliability.”
The executive said the rack’s wider size was a “trade-off” to achieve Helios’ performance advantages, including the 50 percent greater memory bandwidth.
“That trade-off was deemed to be a very acceptable trade-off because these large data centers tend not to be square footage constrained. They tend to be megawatts constrained. And so we think this is the right design point for the market based upon what we’re delivering,” Dieckmann said.
Instinct MI500 Series
AMD revealed that it will release a next-generation AI server rack in 2027 using the company’s Instinct MI500 GPU series and EPYC “Verano” CPUs.
This rack, which appeared to be double-wide like the Helio offering, will also use the same Pensando “Vulcano” NIC as the previous generation.
“We have an ongoing road map of further rack-scale architectures in 2027 and beyond, with our next GPU architectures, our next EPYC CPU solutions and Pensando NIC solutions that we’re integrating into that solution,” said Andrew Dieckman, corporate vice president and general manager of data center GPU at AMD.
ROCm Enterprise AI
AMD revealed ROCm Enterprise AI as an upcoming MLOps and cluster management platform that it said will enable enterprises to seamlessly operate Instinct-based data centers for AI applications.
The platform will be made available as part of the seventh version of its ROCm software stack, which will become “generally accessible” in the third quarter.
While AMD is expected to begin previewing the ROCm 7.0 release with developers on Thursday, early access for ROCm Enterprise is expected to begin in August, said Ramine Roane, corporate vice president of AMD’s AI solutions group, in the Wednesday briefing with journalists.
The platform will integrate with third-party software to help partners and customers build end-to-end applications.
These third parties include VMware Cloud Foundation, RedHat’s OpenShift AI, Canonical, Clarifai, Rapt.ai, Clear ML, Llama Stack and the Open Platform for Enterprise AI. The latter party is a Linux Foundation group that was founded by Intel and includes several IT vendors as members.
The ROCm Enterprise AI platform will include “tools for model fine-tuning with industry-specific data and integration with both structured and unstructured workflows,” according to AMD.
In addition to providing MLOps capabilities, the platform will feature cluster management, AI workload and quota management as well as integration with Kubernetes and Slurm, the company said.
In tandem with AMD’s developer ecosystem, these tools will allow partners and customers to develop reference applications for chatbots and document summarization, among other things.
Roane said he “believes” ROCm Enterprise AI will be free unlike the Nvidia AI Enterprise platform, which requires a paid license to use.
ROCm Distributed Inference
AMD said it plans to introduce distributed inference capabilities in the upcoming ROCm 7.0 release that it called a response to Nvidia’s new Dynamo framework that the rival has used to speed up AI inference workloads.
Ramine Roane, corporate vice president of AMD’s AI solutions group, said in the Wednesday briefing that the distributed inference feature was developed with the open-source community.
“There is a new open-source software called llm-d for distributed that basically does what Dynamo does,” he said.
The distribute inference feature will also take advantage of the open-source vLLM and SGLang frameworks.
ROCm Coming To Windows
AMD announced that its ROCm software stack for GPU programming is coming to Windows later this year.
ROCm will initially come to Windows with support for the ONNX runtime in a July preview release, according to the company. It will then extend support to the PyTorch machine learning framework in the third quarter.
“Now ROCm will work on laptops, on workstations, on Linux, on Windows,” said Ramine Roane, corporate vice president of AMD’s AI solutions group, in the Wednesday briefing.