Nvidia Puts Groq LPU, Vera CPU And Bluefield-4 DPU Into New Data Center Racks

Announced at Nvidia’s GTC 2026 event, the AI infrastructure giant’s new Groq-based inference server rack, called the Nvidia Groq 3 LPX, will be available alongside the Vera Rubin NVL72 rack, Vera CPU rack and BlueField-4 STX storage rack in the second half of the year.

https://www.youtube.com/watch?v=hpG-4i9xvEg

Nvidia said Monday that it’s adding one more processor to the six-chip Vera Rubin platform it has heralded as the next big leap in AI computing: the Groq language processing unit.

At its GTC 2026 event in San Jose, Calif., the AI infrastructure giant revealed that it plans this year to release a server rack with a new generation of the language processing unit (LPU) it licensed from AI chip startup Groq as part of a non-exclusive deal last December.

The Santa Clara, Calif.-based company also revealed three other new racks: a server rack packed with Nvidia’s custom Vera CPUs, a storage rack reference architecture featuring its BlueField-4 DPUs and a networking rack with its Spectrum-6 Ethernet switches.

However, Nvidia indicated that one previously announced Vera Rubin product, an NVL server rack powered by the Rubin CPX GPU, is on hold, at least for now.

The expanded platform, which Nvidia CEO Jensen Huang was expected to detail during his GTC keynote on Monday, is part of the vendor’s push to enable a new wave of AI agents that interact with each other to carry out complex tasks. And it’s coming as the company faces increased competition from pure-play rivals like AMD and Qualcomm as well as major customers such as Amazon Web Services who are developing their own AI chips.

Vera Rubin is the much-anticipated successor to the Grace Blackwell platform, which played a major role in the company finishing its 2026 fiscal year with a record $215.9 billion in revenue.

In a briefing with journalists, Ian Buck, Nvidia’s vice president of hyperscale and high-performance computing, said the underlying chips, including the Rubin GPUs in the Vera Rubin NVL72 rack, are “designed to operate together as one incredible AI supercomputer.”

The seven chips will “power every phase of AI, from massive scale pre-training to post-training, test-time scaling and real-time agentic inference,” the latter of which represents what the company now views as the fourth AI scaling law, Buck added.

The executive said the new Groq-based rack, called the Nvidia Groq 3 LPX, will be available alongside the Vera Rubin NVL72 in the second half of the year. Other products based on the Vera Rubin platform are expected to become available starting in that timeframe.

“The Vera Rubin platform is going to expand the entire AI factory revenue opportunity and open the next frontier in agentic AI, with seven new chips now in full production to scale across the world’s largest AI factories,” Buck said in the Sunday briefing.

Nvidia Uses Groq To Boost Premium AI Models

Buck said Nvidia will offer the Groq 3 LPX alongside its Vera Rubin NVL72 to boost inference performance for premium, trillion-parameter AI models by several orders of magnitude, significantly increasing the revenue AI model providers can generate.

During his presentation, the executive claimed that the two server racks combined can boost throughout for a 1-trillion-parameter GPT model by 35 times compared to the previous-generation Blackwell NVL72.

The claim was based on the combined racks enabling 300 tokens per second for every megawatt consumed by Nvidia’s rack-scale platforms, with the model serving 500 tokens per second for every user, the latter of which Buck said amounts to an opportunity of $45 generated by AI model providers for every million tokens.

With the Vera Rubin NVL72 and Groq 3 LPX racks enabling major improvements in performance and efficiency, the executive said this will enable AI model providers to generate 10 times more revenue from trillion-parameter models than the Blackwell NVL72.

“We’ll be working deeply with the AI labs and the AI frontier model builders who are deploying these trillion-parameter models to offer the next generation of premium and ultra-premium model serving,” Buck said.

Packing 256 Groq 3 LPUs, the Groq 3 LPX rack will be liquid cooled and connect through a custom Spectrum-X Ethernet interconnect to the Vera Rubin NVL72, which contains 36 Vera CPUs and 72 Rubin GPUs, to boost decode performance. Decode is a critical process for agentic AI models that allows them to produce complex, multi-step responses.

Between these two racks, the Groq 3 LPUs and Rubin GPUs will work together “at every layer of the AI model on every token,” according to Buck.

He said Nvidia turned to Groq’s chip technology because while GPUs feature large memory capacity and “amazing floating point performance” to offer high throughout for AI systems sold in volume, LPUs are “optimized strictly for that extreme low-latency token generation, offering token rates” of up to thousands of tokens per second.

The LPU’s low latency is made possible by its use of SRAM memory. While each chip only features 500 MB of SRAM in contrast to the 288 GB of HBM4 memory of the Rubin GPU, the LPU’s SRAM bandwidth is a whopping 150 TBps, which is seven times faster than the 22 TBps HBM4 bandwidth of the Rubin GPU, according to Buck.

With the Groq 3 LPX’s 256 LPUs, the rack will feature a total SRAM capacity of 128GB and a total SRAM bandwidth of 40 PBps, according to Nvidia. Data centers using the platform will be able to scale to more than 1,000 LPUs across multiple racks.

“We offload parts of the computation for every token to the LPU, primarily the FFM layers, to take advantage of the high bandwidth that the LPU has to offer while the attention math and the rest of the model is still being run on the GPU,” he said.

Nvidia’s Groq-Based Server Rack Came Together Fast

Buck said Nvidia started developing the Groq 3 LPX after it gained a non-exclusive license to Groq’s technology and hired members of the startup’s team, including its founders, last December for implementation into Nvidia’s platforms. The deal was reportedly worth $20 billion, the most it has ever paid for technology and personnel.

To quickly build a rack around the Groq 3 LPU, Nvidia took advantage of its modular MGX rack architecture that the company uses for its NVL72 platforms, according to the executive.

“It’s been a real privilege to have them and their team join Nvidia, and the collaboration between the two teams has been excellent,” he said.

Asked by CRN about potential Groq 3 LPX availability from OEMs, Buck indicated that the company is focused on direct engagements with AI developers who are providing trillion-parameter, high-token-rate models with low latency.

“Those will be more focused and exciting opportunities that we’ll be able to share more later this year,” he said.

Nvidia Calls Vera The ‘Best CPU For Agentic AI Workloads’

With Buck calling Nvidia’s Vera the “best CPU for agentic AI workloads,” the company plans to offer the chip in its first CPU-only server rack in addition to the Vera Rubin NVL72.

The liquid-cooled CPU rack will contain 256 Vera CPUs, up to 400 TB of LPDDR5X memory capacity, 300 TBps of memory bandwidth and 64 BlueField-4 DPUs. The rack will be able to support more than 22,500 concurrent CPU environments, according to the company.

Compared to a rack with Nvidia’s previous-generation Grace CPU, the Vera CPU rack can deliver two times greater performance across various workloads, including scripting, text conversion, code compilation, data analytics and graph analytics, according to Nvidia.

Buck said customers are expected to deploy the CPU rack “at scale” alongside Nvidia’s NVL72 racks, storage racks and networking racks for agentic AI workloads.

The executive said CPUs are important for such workloads because GPUs rely on them to “do the tool calling, SQL query and compilation of code.”

“This sandbox execution is a critical part of both training and deploying agents across the data centers, and those CPUs need to be fast. We want to make sure that they could actually do the tool calling as quickly as possible to keep the GPU and the entire data center fully utilized,” Buck said.

From a competitive perspective, Vera has three times more memory bandwidth per core, double the energy efficiency and 50 percent more single-threaded performance than “today’s modern x86 CPUs,” the executive said without providing more specifics.

Marking Nvidia’s first server CPU to use custom, Arm-compatible cores, Vera features 88 of these custom cores, 176 threads with Nvidia’s new spatial multi-threading technology, 1.5 TB of system LPDDR5x memory, 1.2 TBps of memory bandwidth and confidential computing capabilities. It also features a 1.8 TBps NVLInk chip-to-chip interconnect to support coherent memory with the GPUs.

Vera will become available in the second half of the year from a wide range of cloud service providers, including Lambda, Oracle Cloud and Nebius, as well as many OEMs, including Dell Technologies, HPE, Cisco, Lenovo and Supermicro.

Nvidia Details New BlueField-4, Spectrum-6 Racks

In addition to the Groq 3 LPX and Vera CPU racks, Nvidia revealed a storage rack reference architecture powered by BlueField-4 DPUs to speed up agentic AI workloads.

Called BlueField-4 STX, the modular reference architecture is designed to let storage providers build infrastructure solutions that significantly improve the rate at which data can be accessed by agentic AI applications.

“Agentic AI demands real time access to data and contextual working memory to keep the conversations fast and coherent. and as that context grows and AIs get smarter, traditional storage and data paths can slow AI inference and reduce GPU utilization,” Buck said.

Nvidia said the first rack-scale implementation of STX will include the new Nvidia CMX context memory storage platform, which the company revealed in January. This platform “expands GPU memory with a high-performance context layer for scalable inference and agentic systems,” according to the company.

Nvidia claimed that this, in turn, will enable AI agents to provide up to five times more tokens per second compared with traditional storage.

The company also said that the STX architecture provides four times greater energy efficiency than “traditional CPU architectures for high-performance storage “and can “ingest two more pages per second for enterprise AI data.”

STX-based solutions are expected to become available in the second half of this year from storage vendors, including Dell, HPE, IBM NetApp, Hitachi Vantara, DDN, Everpure, Nutanix, Cloudian, Weka, Vast Data and MinIO.

The Spectrum-6 SPX Ethernet rack, on the other hand, is “engineered to accelerate east-west-traffic across AI factories,” taking advantage of Nvidia’s Spectrum-X Ethernet or Quantum-X800 InfiniBand switches to deliver “low-latency, high-throughput rack-to-rack connectivity at scale,” according to the company.

Rubin CPX Platform Not A Focus For Nvidia Right Now

Last September, Nvidia revealed a “new class of GPU” called Rubin CPX that was designed to speed up complex AI applications, including software coding and generative video.

When it was announced, the company said that the Rubin CPX and the associated Vera Rubin NVL144 CPX rack-scale platform would debut by the end of this year.

However, Nvidia did not mention the Rubin CPX in its Sunday briefing with journalists.

A statement by a Nvidia spokesperson indicated that the company has put Rubin CPX-based products on the backburner to focus on the Groq-based LPX platform.

“Delivering accelerated token generation with LPX into our portfolio and platform to optimize the decode is where we’re focused right now, and we’re excited to be bringing this to market in [the second half] of 2026,” the representative told CRN in an email.

The Rubin CPX was meant to speed up the performance of “massive context” AI applications by serving as the dedicated GPU for context and prefill computation, the first of two steps in Nvidia’s disaggregated inferencing serving process. The vanilla Rubin GPU, on the other hand, would handle the second step: generation and decode computation.

The GPU’s platform, the Vera Rubin NVL144 CPX, was expected to contain four Rubin CPX GPUs, four Rubin GPUs and two Vera CPUs in each of the rack’s 18 compute trays. The platform was named before Nvidia changed the way it counts GPUs in January, which resulted in the regular Vera Rubin platform’s suffix changing from NVL144 to NVL72.