Nvidia Touts New Storage Platform, Confidential Computing For Vera Rubin NVL72 Server Rack
The AI infrastructure giant used the CES 2026 keynote by Nvidia CEO Jensen Huang to mark the launch of its Rubin GPU platform, the highly anticipated follow-up to its fast-selling Blackwell Ultra products. Availability from partners is set to begin in the second half of this year.
Nvidia on Monday revealed a new “context memory” storage platform, “zero downtime” maintenance capabilities, rack-scale confidential computing and other new features for its forthcoming Vera Rubin NVL72 server rack for AI data centers.
The AI infrastructure giant used the CES 2026 keynote by Nvidia CEO Jensen Huang to mark the launch of its Rubin GPU platform, the highly anticipated follow-up to its fast-selling Blackwell Ultra products. But while the company said Rubin is in “full production,” related products won’t be available from partners until the second half of this year.
[Related: The 10 Biggest Nvidia News Stories Of 2025]
Huang and other Nvidia officials in recent months have fought back fears of the massive AI data center build-out representing a bubble by stating that the company expects to make $500 billion from Blackwell and Rubin products between the start of last year and end of this year, citing ongoing demand for generative, agentic and physical AI solutions.
In promoting Rubin, Nvidia touted support from a wide range of support from large and influential tech companies, including Amazon Web Services, Microsoft, Google Cloud, CoreWeave, Cisco, Dell Technologies, HPE, Lenovo and many more.
The Santa Clara, Calif.-based company plans to initially make Rubin available in two ways: through the Vera Rubin NVL72 rack-scale platform, which connects 72 Rubin GPUs and 36 of its custom, Arm-compatible Vera CPUs, and through the HGX Rubin NVL8 platform, which connects eight Rubin GPUs for servers running on x86-based CPUs.
Both of these platforms will be supported by Nvidia’s DGX SuperPod clusters.
The rack-scale platform was originally called Vera Rubin NVL144 when it was revealed at Nvidia’s GTC 2025 event last March, with the 144 number meant to reflect the number of GPU dies in each server rack. But the company eventually decided against this, instead opting to stick with the NVL72 nomenclature used for the Grace Blackwell rack-scale platforms to reflect the number of GPU packages, each of which contain two GPU dies.
The GPU packages for Blackwell products also consist of two GPU dies, which are connected through a high-speed die-to-die interconnect.
“Essentially we’re just being consistent with how we’ve deployed and talked about it for Blackwell, and we’re carrying that forward for Vera Rubin as well,” said Dion Harris, senior director of high-performance computing and AI infrastructure solutions at Nvidia, in a briefing with journalists and analysts on Sunday.
Harris said the Rubin platform, with the Vera Rubin NVL72 rack as its flagship product, consists of the Rubin GPU, the Vera CPU—Nvidia’s first CPU with custom, Arm-compatible cores—and four other new chips the company has co-designed to “meet the needs of the most advanced models and drive down the cost of intelligence.”
Vera Rubin NVL72 Specs And Features
The company provided a litany of specs and features for the Rubin platform, some of which have been shared at previous events.
Each Vera CPU features 88 custom Olympus cores, 176 threads with Nvidia’s new spatial multi-threading technology, 1.5 TB of system LPDDR5x memory, 1.2 TBps of memory bandwidth and confidential computing capabilities. It also features a 1.8 TBps NVLInk chip-to-chip interconnect to support coherent memory with the GPUs.
In the briefing, Harris said the CPU’s confidential computing feature allows Vera Rubin to deliver the “first rack-scale Trusted Execution Environment, maintaining data security across CPU, GPU and the NVLink domain [to protect] the world’s largest proprietary models, training data and inference workloads.”
Compared to Nvidia’s Grace GPU, which is based on Arm’s off-the-shelf Neoverse V2 microarchitecture, Vera offers double the performance for data processing, compression and code compilation, according to the company.
The Rubin GPU, on the other hand, is capable of 50 petaflops for inference computing using Nvidia’s NVFP4 data format, which is five times faster than Blackwell, the company said. It can also perform 35 petaflops for NVFP4 training, which is 3.5 times faster than its predecessor. The bandwidth for its HBM4 high-bandwidth memory is 22 TBps, 2.8 times faster, while the NVLink bandwidth per GPU is 3.6 TBps, two times faster.
The platform also includes the liquid-cooled NVLink 6 Switch for scale-up networking. This switch features 400G SerDes, 3.6 TBps of per-GPU bandwidth for communication between all GPUs, a total bandwidth of 28.8 TBps and 14.4 teraflops of FP8 in-network computing.
In addition, the Rubin platform makes use of Nvidia’s ConnectX-9 SuperNIC and BlueField-4 DPU to take scale-out networking to the next level, according to the company.
All of these parts go into the Vera Rubin NVL72 platform, which is capable of 3.6 exaflops of NVFP4 inference performance, five times greater than the Blackwell-based iteration, Nvidia said. Training performance with the NVFP4 format reaches a purported 2.5 exaflops, which is 3.5 times higher than the predecessor.
Vera Rubin also features 54 TB of LPDDR5x capacity, 2.5 times higher than Blackwell, and 20.7 TB of HBM4 capacity, 50 percent more than the predecessor, Nvidia said. HBM4 bandwidth reaches 1.6 PBps, 2.8 times greater, and a scale-up bandwidth of 260 TBps second, double that of the Blackwell NVL72 platform.
“That’s more bandwidth than the entire global internet,” Harris said.
Nvidia Touts Third-Gen NVL72 Rack Resiliency Features
Nvidia said Vera Rubin also features the third generation of its NVL72 rack resiliency technologies, which includes a cable-free modular tray design that allows for 18 times faster assembly and service.
Other features include NVLink Intelligent Resiliency, which the company claims will allow for maintenance of servers with “zero downtime.”
“The NVLink switch trays now features zero downtime maintenance and fault tolerance, allowing racks to remain operational while switch trays are removed or partially populated,” Harris said.
There’s also a second-generation RAS Engine for reliability, availability and serviceability needs, which Nvidia said will enable GPU diagnostics without taking the rack offline.
“All of these features increase system uptime and goodput, which further drives down the cost of training and inference,” Harris said.
Nvidia Reveals Inference Context Memory Storage Platform
With agentic AI workloads generating massive amounts of context data, Nvidia is introducing a new storage platform it said will provide a significant boost in inference performance and power efficiency for such applications.
Called the Nvidia Inference Context Memory Storage Platform, Harris said the technology uses BlueField-4 and Spectrum-X Ethernet to create “AI-native storage infrastructure for storing KV cache”—which is a data structure that is key to optimizing the way large language models generate tokens or provide responses.
Compared to traditional network storage options for storing inference context data, this new platform delivers up to five times higher tokens per second, five times better performance per dollar and five times better power efficiency, according to Harris.
“That translates directly into higher throughput, lower latency and more predictable behavior,” he said. “And it really matters for the workloads we’ve been talking about: large-context applications like multi-turn chat, retrieval-augmented generation and agentic AI multi-step reasoning. These workloads stress how efficiently context can be stored, reused and shared across the entire system.”
Harris said Nvidia is “working closely with our storage partners to bring a new tier of inference context memory to the Rubin platform so customers can deploy it as part of a complete, integrated AI infrastructure.”