Analysis: Nvidia, AMD Give Partners New AI Selling Points For GPUs In PCs

A pair of software updates enable new large language model and machine learning capabilities in GPUs for PCs from Nvidia and AMD, giving channel partners fresh opportunities to sell GPU upgrades for existing PC fleets or new PCs equipped with discrete GPUs in the fast-growing AI computing market.

ARTICLE TITLE HERE

Nvidia and AMD have given channel partners a new set of AI selling points for GPUs in PCs thanks to a pair of software updates that enable fresh large language model and machine learning capabilities.

On Monday, AMD said that it has enabled support of the PyTorch framework for machine learning (ML) training and inference on two desktop GPUs: its consumer-focused flagship, the Radeon RX 7900 XTX, and its workstation-oriented sibling, the Radeon Pro W7900.

unit-1659132512259

type

AMD Brings ML Training And Inference To Radeon 7900 Series GPUs

In AMD’s Monday announcement, the chip designer said it is enabling support for the open-source PyTorch ML framework on the Radeon RX 7900 XTX and Radeon Pro W7900 GPUs through its ROCm software stack, which allows developers to code software on AMD GPUs.

The implementation, enabled by the latest version of ROCm, is currently available on PCs running the Ubuntu Linux operating system.

AMD said the move will give AI developers a “local, private and cost-effective workflow for ML training and inference” compared with the cloud-based solutions they use today.

“A local PC or workstation system with a Radeon 7900 series GPU presents a capable, yet affordable solution to address these growing workflow challenges thanks to large GPU memory sizes of 24 GB and even 48 GB,” AMD Product Marketing Manager David Diederichs wrote in a blog post.

This means a PC user will have the ability to train ML models on data sets—a key aspect of AI development—and then perform inference on those models to run applications using the computer’s GPU rather than hosting it in the cloud.

While the Radeon RX 7900 XTX comes with a premium $999 price tag, it is less expensive than the $3,999 Radeon Pro W7900, and both are less expensive than data center GPUs that can cost thousands of dollars more. The main tradeoff is that these GPUs are less powerful than data center chips, but that’s not a problem for a wide swath of AI development activity that relies on smaller models.

Nvidia Accelerates LLMs On RTX-Powered Windows PCs

In Nvidia’s Tuesday announcement, the company said it will soon release on its website the open-source TensorRT-LLM software library for Windows PCs powered by most of its RTX GPUs.

The compatible chips consist of the second and third generations of Nvidia’s consumer-focused GeForce RTX GPUs and its workstation-oriented RTX GPUs for desktops and laptops, an Nvidia spokesperson told CRN. This means GeForce RTX 30 and 40 series GPUs such as the RTX 3060 and RTX 4090 are supported but not RTX 20 series GPUs like the RTX 2070.

By using TensorRT-LLM, RTX-powered Windows PCs will have the ability to run inference on the latest LLMs such as Llama 2 and Code Llama up to four times faster than before, Nvidia said.

“At higher batch sizes, this acceleration significantly improves the experience for more sophisticated LLM use—like writing and coding assistants that output multiple, unique auto-complete results at once,” Jesse Clayton, a director of product management at Nvidia, wrote in a blog post.

Clayton said TensorRT-LLM will also benefit scenarios where LLMs are integrated with other technologies such as retrieval-augmented generation, which allows an LLM to “deliver [more targeted] responses based on a specific dataset, like user emails or articles on a website.”

To help developers, Nvidia has released tools for accelerating LLMs, including scripts for optimizing custom models with TensorRT-LLM, open-source models optimized for TensorRT-LLM and a developer reference project for demonstrating the speed and quality of LLM responses.

But Nvidia didn’t stop there with enabling new generative AI experiences on PCs. The company said its TensorRT software library has been implemented into a distribution of the popular image generation model Stable Diffusion with a web-based user interface.

This allows the web-based Stable Diffusion interface, developed by AUTOMATIC1111, to generate images two times faster with an RTX GPU than previously possible, according to Nvidia. By using this implementation, a GeForce RTX 4090 can run the Stable Diffusion implementation seven times faster than Apple’s most powerful M2 Ultra chip in a Mac computer, the company added.

New Software Updates ‘Democratize’ AI Development

The implications of these developments are substantial: making it more feasible to develop and run AI applications on PCs with GPUs, including consumer-grade products, that are more affordable.

This, in turn, will expand sales opportunities for solution providers who were already selling and seeing heightened demand for expensive workstation PCs for AI development, according to Randy Copeland, president and CEO of Richmond, Va.-based PC system builder Velocity Micro.

“I think that it’s going to open it up to a lot more hobbyists, a lot more people who just want to toy with it, who want to learn about it. It basically democratizes it so that you don’t have to have a $50,000 workstation to do AI development,” he said.