Nvidia Packs ‘New Class Of GPU’ Inside Vera Rubin NVL144 CPX Platform
Set to launch by the end of 2026, the Rubin CPX and the associated Vera Rubin NVL144 CPX platform will significantly improve performance of complex software coding and generative video applications that can receive up to a million tokens or more as input from the user, Nvidia says.
Nvidia Tuesday revealed an alternative version of its upcoming Vera Rubin NVL144 computing platform that will significantly increase the number of computer chips with a “new class of GPU” designed to speed up complex AI applications.
Unveiled at the AI Infra Summit, the new GPU is called the Rubin CPX. Nvidia said it will enable AI systems to better handle software coding and generative video applications that can receive up to a million tokens or more as input from the user and retain the information in what are called “long-context windows” to carry out complex operations.
[Related: Nvidia CEO Sees ‘Significant Growth Opportunities Ahead’ Despite China-Related Hurdles]
The Rubin CPX and the associated Vera Rubin NVL144 CPX platform are set to launch by the end of next year after the vanilla Rubin GPU and its related Vera Rubin NVL144 platform debut sometime in the second half of 2026. Nvidia also plans to offer the Rubin CPX in “other flexible configurations for customers looking to reuse existing infrastructure.”
The Santa Clara, Calif.-based company said leading-edge AI companies are already evaluating Rubin CPX, including AI-powered code editing tool provider Cursor, generative video provider Runway and software engineering platform provider Magic.
“With Nvidia Rubin CPX, Cursor will be able to deliver lightning-fast code generation and developer insights, transforming software creation,” said Michael Truell, CEO of Cursor, in a statement provided by Nvidia. “This will unlock new levels of productivity and empower users to ship ideas once out of reach.”
In a briefing with journalists and analysts, Shar Narasimhan, director of data center product at Nvidia, said these large context windows of 1 million tokens or more enable AI agents to “move beyond simple bug fixes in code and support advanced software applications and systems development.” They also allow the generation of “contextually aware, temporally stable video,” he added.
Narasimhan said the Rubin CPX will allow Nvidia to significantly speed up the performance of these “massive context” AI applications by serving as the dedicated GPU for context and prefill computation, the first of two steps in the company’s disaggregated inferencing serving process. The vanilla Rubin GPU, on the other hand, will handle generation and decode computation, which is the second step.
“It will dramatically increase the productivity and performance of AI factories,” he said.
The Vera Rubin NVL144 CPX platform will essentially double the number of discrete GPUs over the vanilla Vera Rubin NVL144 platform by adding four Rubin CPX GPUs to each of the platform’s 18 compute trays. (While the platform features 72 dual-reticle Rubin GPUs, Nvidia counts each reticle as one GPU to reach the 144 number—a change from how it treats each dual-reticle Blackwell and Blackwell Ultra GPU as a single GPU.)
The combination of four Rubin CPX GPUs, four Rubin GPUs and two Arm-based Vera CPUs in each compute tray will make the Vera Rubin NVL144 CPX platform capable of 8 exaflops of NVFP4 computation, according to Nvidia. NVFP4 is a new 4-bit floating point format Nvidia recently introduced to maintain a higher level of accuracy for AI models that is typically only possible with larger numerical formats.
The platform’s 8-exaflop mark is more than double the 3.6 exaflops of NVFP4 computation that the vanilla Vera Rubin NVL144 platform is set to deliver, according to Nvidia. It’s also set to be 7.5 times faster than the Blackwell Ultra-based GB300 NVL72 platform that came out this year.
The Vera Rubin NVL144 CPX will also feature 1.7 PBps in memory bandwidth and 100 TB of fast memory, higher than the 1.4 PBps and 75 TB of the vanilla platform. These specs also represent three times greater bandwidth and 2.5 times higher capacity than the GB300 NVL72 platform, according to the company.
Nvidia is expected to deliver a dual-rack offering that combines Vera Rubin NVL144 and Vera Rubin NVL144 racks to bring the fast memory capacity to 150 TB. This offering is also expected to debut by the end of next year.
Under the hood, the Rubin CPX features 30 petaflops of NVFP4 computation, three times greater exponent operations than the GB300 Superchip, 128 GB of GDDR7 memory as well as four Nvidia video encoders and four Nvidia video decoders, which are included to aid with generative video applications.
By contrast, the vanilla Rubin GPU comes with 288 GB of HBM4 high-bandwidth memory, which is more expensive than GDDR7. When the company announced Rubin in March, it said that the GPU will be capable of 50 petaflops of FP4 computation—before Nvidia revealed the NVFP4 format in June.