NVIDIA released the new generation Rubin platform, with inference costs reduced by 10 times compared to Blackwell, and plans to ship in the second half of the year

Wallstreetcn
2026.01.05 23:12
portai
I'm PortAI, I can summarize articles.

The training performance of the Rubin platform is 3.5 times that of Blackwell, and the performance of running AI software has improved by 5 times, with the number of GPUs required for training mixed expert models reduced by 4 times. Jensen Huang stated that all six Rubin chips have passed key tests showing they can be deployed as planned. NVIDIA announced that the platform has been fully put into production, with cloud service providers such as Amazon AWS, Google Cloud, Microsoft, and Oracle Cloud being the first to deploy it

NVIDIA launched the next-generation Rubin AI platform at the CES exhibition, marking its annual update pace in the artificial intelligence (AI) chip field. This platform achieves a significant leap in inference cost and training efficiency through the integrated design of six new chips, which will be delivered to the first customers in the second half of 2026.

On Monday, the 5th, Eastern Time, NVIDIA CEO Jensen Huang stated in Las Vegas that the six Rubin chips have returned from the manufacturing partner and have passed some key tests, progressing as planned. He pointed out, "The AI race has begun, and everyone is striving to reach the next level." NVIDIA emphasized that the operating costs of systems based on Rubin will be lower than those of the Blackwell version, as they achieve the same results with fewer components.

Microsoft and other large cloud computing providers will be among the first customers to deploy the new hardware in the second half of the year. Microsoft's next-generation Fairwater AI super factory will be equipped with NVIDIA Vera Rubin NVL72 rack-level systems, scalable to hundreds of thousands of NVIDIA Vera Rubin super chips. CoreWeave will also be one of the first suppliers to offer Rubin systems.

The launch of the platform comes at a time when some on Wall Street are concerned about increased competition for NVIDIA and are skeptical about whether spending in the AI field can maintain its current pace. However, NVIDIA maintains a long-term bullish forecast, believing that the total market size could reach trillions of dollars.

Performance Improvements Targeting Next-Generation AI Demands

According to NVIDIA's announcement, the training performance of the Rubin platform is 3.5 times that of the previous Blackwell, while the performance for running AI software has improved fivefold. Compared to the Blackwell platform, Rubin can reduce inference token generation costs by up to 10 times and decrease the number of GPUs required for training mixture of experts (MoE) models by four times.

The new platform is equipped with the Vera CPU, which has 88 cores and is twice as powerful as its alternatives. This CPU is designed for agent inference and is the most energy-efficient processor in large-scale AI factories, featuring 88 custom Olympus cores, full Armv9.2 compatibility, and ultra-fast NVLink-C2C connectivity.

Rubin GPUs are equipped with a third-generation Transformer engine, featuring hardware-accelerated adaptive compression capabilities, providing 50 petaflops of NVFP4 computing power for AI inference. Each GPU offers 3.6TB/s of bandwidth, while the Vera Rubin NVL72 rack provides 260TB/s of bandwidth.

Chip Testing Progressing Smoothly

Jensen Huang disclosed that all six Rubin chips have returned from manufacturing partners and have passed key tests showing they can be deployed as planned. This statement indicates that NVIDIA is maintaining its leading position as a manufacturer of AI acceleratorsThe platform includes five major innovative technologies: sixth-generation NVLink interconnect technology, Transformer engine, confidential computing, RAS engine, and Vera CPU. Among them, the third-generation confidential computing technology makes the Vera Rubin NVL72 the first rack-level platform to provide data security protection across CPU, GPU, and NVLink domains.

The second-generation RAS engine spans GPU, CPU, and NVLink, featuring real-time health checks, fault tolerance, and proactive maintenance functions to maximize system productivity. The rack adopts a modular, cable-free tray design, allowing assembly and maintenance speeds to be 18 times faster than Blackwell.

Extensive Ecosystem Support

NVIDIA stated that cloud partners including Amazon's AWS, Google Cloud, Microsoft, and Oracle Cloud will be the first to deploy Vera Rubin-based instances by 2026, with cloud partners CoreWeave, Lambda, Nebius, and Nscale following suit.

OpenAI CEO Sam Altman stated, "Intelligence scales with computation. As we add more computation, models become stronger, capable of solving harder problems, and having a greater impact on people. NVIDIA's Rubin platform helps us continuously expand this progress."

Dario Amodei, co-founder and CEO of Anthropic, noted that the efficiency improvements of NVIDIA's Rubin platform represent a foundational advancement that enables longer memory, better reasoning, and more reliable outputs.

Meta CEO Mark Zuckerberg stated that NVIDIA's Rubin platform is expected to bring a leap in performance and efficiency, which is necessary for deploying state-of-the-art models to billions of people.

NVIDIA also mentioned that Cisco, Dell, Hewlett Packard Enterprise, Lenovo, and Supermicro are expected to launch various servers based on Rubin products. AI labs including Anthropic, Cohere, Meta, Mistral AI, OpenAI, and xAI are looking forward to leveraging the Rubin platform to train larger and more powerful models.

Early Disclosure of Product Details

Commentary noted that NVIDIA disclosed new product details earlier this year than in previous years, which is one of the company's efforts to maintain industry reliance on its hardware. NVIDIA typically provides in-depth product details at the GTC event held in San Jose, California, every spring.

For Jensen Huang, CES is just another stop in his marathon of appearances. He aims to announce products, collaborations, and investments across various events, all intended to drive momentum for AI system deployment.

The new hardware announced by NVIDIA also includes networking and connectivity components, which will be part of the DGX SuperPod supercomputer, while also being available as standalone products for customers to use in a more modular way. This performance enhancement is necessary as AI has shifted towards more specialized model networks, which not only need to filter massive inputs but also solve specific problems through multi-stage processes.

NVIDIA is pushing for AI applications across the entire economy, including robotics, healthcare, and heavy industry. As part of this effort, NVIDIA announced a series of tools aimed at accelerating the development of autonomous vehicles and roboticsCurrently, most of NVIDIA's computing spending comes from the capital expenditure budgets of a few clients, including Microsoft, Google Cloud under Alphabet, and AWS under Amazon