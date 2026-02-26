Nvidia is gearing up to roll out its next AI computing system, Vera Rubin, in the second half of 2026. According to a report by CNBC, Nvidia’s Vera Rubin will deliver around 10 times more performance per watt than its current Grace Blackwell platform while supporting far larger AI workloads.

What Vera Rubin is and what it does

Vera Rubin is a rack-scale AI platform built around a custom architecture that combines 72 Rubin GPUs and 36 Vera CPUs, assembled into a system with roughly 1.3 million components sourced globally.

According to the CNBC report, the rack weighs close to two tonnes and contains around 1,300 microchips — up from 864 in the Grace Blackwell system. It is designed as a modular platform, allowing individual superchips to slide out of one of the rack’s 18 compute trays for easier servicing. In the Blackwell system, comparable components were soldered onto boards.

Unlike consumer chips in everyday laptops or phones, Vera Rubin targets data centres and enterprise AI deployments, where multiple processors work together as a cohesive unit. Vera Rubin is also Nvidia’s first AI system to be 100 per cent liquid cooled, replacing traditional cooling methods in favour of direct liquid thermal management.

How does it compare to Grace Blackwell?

Grace Blackwell, which entered production in 2024, significantly expanded how much compute could be delivered within a single rack-scale system. It became central to Nvidia’s dominance in AI infrastructure and is widely deployed across major cloud providers. Vera Rubin takes that model further.

According to Nvidia, the new system delivers 10 times more performance per watt compared to Grace Blackwell. While Vera Rubin consumes about twice as much power overall, it generates substantially more computational output relative to each unit of energy used. In AI infrastructure, performance per watt has become one of the most critical benchmarks as data centres grapple with power constraints.

Why efficiency matters

AI models, especially large language models and multimodal systems, require enormous computing power. As deployments scale, electricity consumption and cooling capacity have become bottlenecks.

Improving performance per watt helps data-centre operators:

Lower cost per AI workload

Run more compute within existing power limits

Improve infrastructure utilisation

What are the trade-offs?

However, greater efficiency does not necessarily mean lower total energy use. Even with its 10 times improvement in performance per watt, Vera Rubin still draws roughly twice the total power of Grace Blackwell. As hyperscalers deploy more of these racks, overall electricity demand could continue to rise.

The system’s full shift to liquid cooling may improve thermal efficiency, but it also increases infrastructure complexity and reinforces the scale of AI data-centre buildouts.

There is also a broader industry trend: when computing becomes more efficient and cost-effective, usage typically expands. More efficient AI systems enable more models, more inference workloads and wider adoption, potentially accelerating total energy demand. Similar patterns have been observed with smartphones and other consumer technologies.

Why this matters for AI and data centres

AI workloads, particularly large models and multi-step reasoning tasks, are extremely compute-intensive and can strain existing data-centre power and cooling infrastructure. Efficiency improvements such as those Nvidia is targeting with Vera Rubin help reduce operating costs, support larger models and make future AI applications more sustainable.

The energy cost of training and deploying complex models has become a key consideration for enterprises as AI adoption grows. By delivering significantly more compute per watt, Vera Rubin could reduce electricity costs for cloud providers and improve performance at scale.