Track Hyper | Vera Rubin: NVIDIA Dominates the AI World

New architectures for CPU and GPU are on the way

AI has become the undisputed protagonist and focus of COMPUTEX (Taipei International Computer Show) 2024.

Among various terminal chip designers including NVIDIA, Intel, AMD, Qualcomm, MediaTek, and Arm, NVIDIA is undoubtedly the most eye-catching. On June 5th, NVIDIA's stock price hit a new high, closing at $1224.40, up 5.16%, with a market value matching that of Apple, reaching $3 trillion.

NVIDIA's co-founder and CEO, Jensen Huang, updated the AI GPU technology roadmap at the COMPUTEX 2024 conference: in 2025, NVIDIA will release Blackwell Ultra, introduce a new architecture Rubin in 2026, and launch Rubin Ultra in 2027.

Starting from the investor conference in October 2023, NVIDIA changed its chip update cycle from every two years to annually. During that meeting, Jensen Huang announced the expected release of H200 and B100 GPUs this year; in 2025, the NVIDIA X100 GPU will also be unveiled.

According to the strategic core of NVIDIA's AI chip planning framework from October last year, the "One Architecture" unified architecture supports model training and deployment anywhere, whether in data centers (IDC) or edge devices, as well as x86 and Arm architectures.

In other words, NVIDIA's AI chip solutions are suitable for training tasks in ultra-large-scale data centers and can also meet the edge computing needs of enterprise users, covering x86 or emerging Arm architecture ecosystems.

Currently, NVIDIA is a computing chip and system company that owns GPU, CPU, and DPU (Data Processing Unit). Through NVLink, NVSwitch, and NVLink C2C technologies, NVIDIA flexibly connects CPUs and GPUs to form a unified hardware architecture, complemented by CUDA, forming a complete software-hardware ecosystem.

DPU is a processor specifically designed for data processing. Unlike general-purpose processors (CPUs) and graphics processors (GPUs), DPU focuses on data processing, enabling more efficient data processing and computation.

Setting aside Jensen Huang's soft advertising, such as reviewing NVIDIA's past successes, efforts to reduce costs in computing power, and claiming that "NVIDIA has facilitated the current AI era," and once again emphasizing the AI platform-level value of CUDA - it is because of CUDA that global deep learning scientists can fully utilize its potential, thereby driving progress in the entire industry - Jensen Huang also looked forward to the future of the generative AI era, which was a key focus of NVIDIA's strategic announcement at this conference.

Jensen Huang said, "We are not in a simple AI era now, but in a generative AI era. In this era, almost everything can be converted into Tokens and optimized through generative AI processing." "

"Generative AI will reshape every industry, driving the transformation of the entire $30 trillion IT industry into an AI factory, producing AI products for every industry," said Huang Renxun. "In the future, every PC equipped with an RTX graphics card will become an AI PC, capable of efficiently processing and generating various data."

Doesn't this essentially mean that Huang has defined AI PC? "Every PC equipped with an RTX graphics card will become an AI PC."

The more central focus is actually on Huang Renxun unveiling NVIDIA's AI GPU technology roadmap: by 2027, NVIDIA will update GPU and CPU architectures, as well as introduce a CPU+GPU integrated super chip.

Huang Renxun stated that NVIDIA will continue to use a unified architecture to cover the entire data center GPU product line, with annual updates. The core of this is NVIDIA's official announcement of the new generation GPU architecture "Rubin" to replace the "Blackwell" architecture.

As of now, NVIDIA's high-performance GPU architecture codenamed "Blackwell" has been mass-produced, with related products including the B200/GB200 for the HPC/AI field and the RTX 50 series for gaming.

In 2025, the iterated version of Blackwell, Blackwell Ultra, will be launched; in 2026, the new architecture "Rubin" will be introduced: equipped with next-generation HBM4 high-bandwidth memory (8-layer stack); in 2027, the iterated version of "Rubin," Rubin Ultra, will be launched, with the standard HBM4 memory upgrading from an 8-layer stack to a 12-layer stack.

One detail worth noting: the naming of "Rubin" is inspired by the American female astronomer Vera Rubin.

On the CPU side, the "Vera" architecture will replace the Grace CPU super chip launched in March 2022. The "Vera" architecture name is also inspired by Vera Rubin - the former for NVIDIA's new generation CPU architecture name, while the latter Rubin becomes the name of NVIDIA's new generation GPU architecture.

Previously, NVIDIA did not consider "Grace" as an independent CPU architecture in its AI roadmap, but included it in the category of "Grace+GPU" SuperChip.

This may not have been expressed clearly in terms of marketing, but it actually reflects NVIDIA's strategy of "CPU+GPU integrated super chip"; and the official announcement of the "Vera Rubin" CPU+GPU architecture allows competitors to see NVIDIA's strong dominance in the AI field.

In the SuperChip architecture, NVLink-C2C and NVLink interconnect technologies will continue to play a key role in NVIDIA's future AI chip architecture At this conference, Huang Renxun revealed the latest plan for the new generation super chip composed of Vera CPU and Rubin GPU: adopting the sixth generation NVLink interconnect bus, with a bandwidth of up to 3.6TB/s.

What is the role of NVLink-C2C interconnect technology?

NVIDIA uses it to build GH200, GB200, and GX200 super chips; then through NVLink interconnect technology, two GH200, GB200, and GX200 chips can be linked into GH200NVL, GB200NVL, and GX200NVL modules. Subsequently, NVIDIA can use them to form super nodes through NVLink network, and then use InfiniBand or Ethernet network to ultimately form larger-scale AI clusters.

What are InfiniBand or Ethernet networks? This is NVIDIA's chip exchange technology: the former targets AI Factory, while the latter focuses on AIGC Cloud. Compared to the NVLink bus domain network, InfiniBand and Ethernet belong to traditional network technologies, with a bandwidth ratio of approximately 1:9 between the two networks.

At this conference, Huang Renxun announced updates on the InfiniBand technology and product information in the two open chip interaction technology routes: the new generation data center network card CX9 SuperNIC, with a maximum bandwidth of 1600Gbps, equivalent to 160 million megabits, paired with the new InfiniBand/Ethernet switch X1600.

According to the NVIDIA investor conference held in October 2023, the Quantum series based on InfiniBand and the Spectrum-X series based on Ethernet will continue to be upgraded. It is expected that by 2024, the commercialization of switch chips with 800G interfaces based on 100G SerDes will be achieved; by 2025, switch chips with 1.6T interfaces based on 200G SerDes will also be introduced to the market