NVIDIA Unveils Rubin AI Platform: 72-GPU Super Rack, New Vera CPU, and 5x Blackwell Performance

NVIDIA has once again redefined the frontier of artificial intelligence computing with the official announcement of its Rubin AI platform at CES 2026. Succeeding the current Blackwell architecture, Rubin is not just a new GPU but a comprehensive six-chip platform designed to deliver what NVIDIA claims are “5x higher inference” and “3.5x higher training” performance. The star of the show is the monstrous Vera Rubin NVL72, a rack-scale system packing 72 next-generation GPUs and 36 custom CPUs.

NVIDIA Unveils Rubin AI Platform: 72-GPU Super Rack, New Vera CPU, and 5x Blackwell Performance
NVIDIA Unveils Rubin AI Platform: 72-GPU Super Rack, New Vera CPU, and 5x Blackwell Performance

The announcement signals NVIDIA’s relentless pace of innovation, moving to mass production in Q1 2026, earlier than some expected, to meet the insatiable demand for more efficient and powerful AI infrastructure.


The Engine Room: Rubin AI GPU and Vera CPU

At the heart of the platform are two new flagship chips. The Rubin GPU, built on a cutting-edge process, features a staggering 336 billion transistors. It advances to HBM4 memory, supporting up to 288GB per GPU with a massive 22 TB/s of bandwidth. These specs translate directly into raw performance for large language models and complex AI training workloads.

Partnering with the GPU is NVIDIA’s new Vera CPU. Based on custom-designed Arm “Olympus” cores, the Vera CPU introduces Spatial Multi-Threading to handle 176 threads across 88 cores. It boasts a monumental 1.2 TB/s of memory bandwidth—more than double its predecessor—and is crucial for feeding data to the hungry Rubin GPUs without bottlenecks.


The Flagship: Vera Rubin NVL72 Rack Scale System

NVIDIA’s ultimate configuration is the Vera Rubin NVL72. This is a fully integrated rack containing 72 Rubin GPUs and 36 Vera CPUs, all interconnected with the company’s latest NVLink 6 technology. The numbers are astronomical:

  • 260 TB/s of scale-up bandwidth within the rack.
  • 3.6 ExaFLOPS of FP4 inference performance.
  • Over 20 Terabytes of total HBM4 memory.

This system is engineered for the world’s largest AI clusters, enabling researchers and companies to train next-generation models that are currently unimaginable.


Real-World Impact: Slashing the Cost of AI

Beyond the breathtaking specs, NVIDIA framed Rubin’s advancements in terms of practical economics. The company claims the platform can deliver up to a 10x lower cost per inference token and require 4x fewer GPUs for training massive Mixture-of-Expert (MoE) models compared to Blackwell. This dramatic improvement in efficiency could significantly lower the barrier to developing and deploying advanced AI, making powerful models more accessible through cloud partners.

Also, Read

NVIDIA confirmed that the Rubin platform is already being integrated by major cloud providers, including AWS, Google Cloud, Microsoft Azure, and Oracle Cloud, with systems expected to become available to partners in the second half of 2026.

Source: Nvidia

Leave a Comment