
Jensen Huang's full speech at CES is here! Rubin is fully operational, computing power has surged by 5 times, breaking down the barriers to intelligent driving, all in the physical world

NVIDIA's Vera Rubin platform has fully entered production, enhancing AI inference performance by 5 times and reducing costs to 1/10 through "extreme collaborative design," directly addressing the pain points of intelligent agents that "cannot afford" and "cannot remember." Jensen Huang announced that AI has entered the second half of "thinking," exclaiming that "the moment of physical AI's ChatGPT is near." Through the open-source intelligent driving model Alpamayo and collaboration with Siemens, NVIDIA showcased a full-stack puzzle from chips to robots
At 5 a.m. Beijing time on the 6th, in Las Vegas, USA, under the spotlight of the global "Tech Spring Festival" - the International Consumer Electronics Show (CES), NVIDIA CEO Jensen Huang took the stage in his iconic black crocodile-patterned jacket.
“The AI race has begun, and everyone is striving to reach the next level... If you don't engage in full-stack extreme collaborative design, you simply cannot keep up with the model's growth rate of ten times a year.” In response to the capital market's concerns about the "AI bubble" and anxieties over the failure of Moore's Law, Huang used a new architecture called Vera Rubin to demonstrate to the outside world that NVIDIA still holds the absolute power to define the future of AI.
This speech was different from previous ones that simply announced new graphics cards. Although Huang did not bring new GeForce products this time, he showcased a complete puzzle from atomic-level chip design to the physical world of robotics with an "All in AI" and "All in Physical AI" approach to the capital market.
Three main themes of the speech:
At the infrastructure and computing power level, NVIDIA has violently broken through physical limits through "extreme collaborative design," reconstructing the cost logic of data centers. In the face of a bottleneck where the number of transistors has only increased by 1.6 times, NVIDIA has forcibly increased inference performance by five times through the Vera Rubin platform, NVLink 6 interconnect, and the BlueField-4 driven inference context memory storage platform, while reducing the cost of token generation to 1/10. The core goal at this level is to solve the problems of Agentic AI (agent intelligence) being "unable to compute" and "unable to remember" (memory wall), paving the way for AI to transition from training to large-scale inference.
At the model evolution level, NVIDIA has officially established a paradigm shift from "generative AI" to "inference AI" (Test-time Scaling). Huang emphasized that AI is no longer a one-time question-and-answer process but requires a chain of thought processes involving multi-step reasoning and planning. By open-sourcing the Alpamayo (autonomous driving inference), Cosmos (physical world model), and Nemotron (agent) series models, NVIDIA is promoting AI to possess logical reasoning capabilities and long-term memory abilities, enabling it to handle unseen complex long-tail scenarios.
At the physical implementation level, NVIDIA announced that "physical AI" has officially entered the commercial monetization phase, breaking the situation where AI only exists on screens. The speech clarified the timeline for Mercedes-Benz vehicles to be on the road in Q1 2026 and showcased deep full-stack cooperation with Siemens in the industrial metaverse. By integrating the Omniverse simulation environment, synthetic data generation, and robotic control models, NVIDIA is injecting AI capabilities from the "soft world" of the internet cloud into the "hard world" of cars, factories, and robots on a large scale.
Key Points of the Speech:
Vera Rubin Platform Fully Launched: All six core chips of the next-generation AI computing platform have completed manufacturing and key testing, entering full production phase. Under the physical limit of only a 1.6 times increase in transistors, a 5 times improvement in inference performance and a 3.5 times improvement in training performance have been forcibly achieved through "extreme collaborative design." Microsoft's next-generation AI super factory will deploy hundreds of thousands of Vera Rubin chips.
Rubin Inference Cost Reduced by 10 Times Compared to Blackwell: Clearly responding to market concerns about the high cost of AI, Rubin has reduced the inference token generation cost to 1/10 of Blackwell, making the expensive Agentic AI commercially viable.
Solving the AI "Memory" Bottleneck: Utilizing the BlueField-4 DPU to build an inference context memory storage platform, adding 16TB of high-speed shared memory for each GPU, completely resolving the long text "memory wall" issue.
Physical AI Monetization Moment: Launch of the "Inference-capable" Autonomous Driving Model Alpamayo: Clearly stating that it will be on the road with Mercedes-Benz vehicles in Q1 2026, initiating the revenue cycle for physical AI.
Reconstruction of Energy Economics: The Rubin architecture supports 45°C warm water cooling, eliminating the need for chillers, directly saving 6% of electricity for global data centers.
Expansion of Open Source Ecosystem: Announced the expansion of its open-source model ecosystem, covering key areas such as physical AI, autonomous driving, robotics, and biomedicine, and providing supporting datasets and toolchains.
Industrial Metaverse Implementation: Achieved deep full-stack cooperation with Siemens, embedding NVIDIA AI technology into the underlying global industrial manufacturing, extending from "designing chips" to "designing factories."

New King Debuts: Rubin Platform Fully Launched, Inference Cost Reduced by 10 Times Compared to Blackwell
“Vera Rubin has been fully launched.” Jensen Huang announced the introduction of the next-generation Rubin AI platform at the CES exhibition, which achieves significant leaps in inference cost and training efficiency through the integrated design of six new chips, with the first batch of customers set to receive deliveries in the second half of 2026.
This is also the most concerning news for the market. He described the Rubin GPU as “a giant ship” and elaborated on the underlying logic: “The inference cost of AI must decrease by 10 times each year, while the number of tokens generated by AI 'thinking' (Test-time Scaling) must increase by 5 times each year.” Under the immense pull of these two forces, the iteration pace of traditional chips cannot meet the demands
Jensen Huang used a vivid metaphor to explain the design philosophy of the new generation of AI chips: “ This is not simply about building a better engine, but redesigning the entire car so that the engine, transmission, and chassis work together.” “ Its AI floating-point performance is five times that of Blackwell, but the number of transistors is only 1.6 times that of the latter.” Jensen Huang emphasized that this leap in performance, which exceeds the conventional expectations of Moore's Law, comes from “ extreme collaborative design.”
The “collaboration” he refers to encompasses a comprehensive reconstruction from CPU, GPU, network chips to the entire cooling system. The practical effect of this design is directly reflected in the market's most sensitive cost indicators: the inference cost can be reduced to as low as 1/10 of the Blackwell platform. Specifically:
Computing Power: The Rubin GPU achieves an inference performance of 50 PFLOPS (5 times that of Blackwell) at NVFP4 precision, and a training performance of 35 PFLOPS (3.5 times improvement over the previous generation). Each GPU packages 8 sets of HBM4 memory, with a bandwidth of up to 22 TB/s.
CPU Black Technology: The brand-new Vera CPU integrates 88 custom Olympus Arm cores and employs a design called “Spatial Multi-threading,” which can efficiently run 176 threads simultaneously, addressing the pain point of CPUs lagging behind GPU throughput.
Connection: NVLink 6 boosts the communication bandwidth within the rack to an astonishing 240 TB/s, more than twice the total bandwidth of the global internet.


The Second Half of AI: From “Memorization” to “Logical Thinking”
During the speech, Jensen Huang keenly captured the fundamental changes on the AI model side— Test-time Scaling.
“Inference is no longer a one-time answer, but a process of thinking.” He pointed out that with the emergence of models like DeepSeek R1 and OpenAI o1, AI has begun to exhibit Chain of Thought capabilities. This means that before providing an answer, AI needs to consume a large amount of computing power for multi-step reasoning, reflection, and planningFor investors, this is a huge incremental signal: Future computing power consumption will shift massively from the "training side" to the "inference side." To support this demand for "letting AI think a little longer," computing power must be affordable enough. The core mission of the Rubin architecture is to reduce the inference token generation cost of MoE (Mixture of Experts) to 1/10 of Blackwell's. Only by lowering costs can Agentic AI, capable of handling complex tasks, have the potential for commercial application.
Breaking the Bottleneck: How to Make AI "Remember" Longer Conversations
As AI transitions from simple Q&A to long-term complex reasoning, a new bottleneck emerges—memory.
In the era of Agentic AI, agents need to remember lengthy conversation histories and complex contexts, which generates a massive KV Cache (Key-Value Cache). The traditional solution is to shove this data into expensive HBM memory, but HBM has limited capacity and is costly, which is referred to as the "memory wall."
Jensen Huang explained this issue in detail: “AI's working memory is stored in HBM memory. Every time a token is generated, it has to read the entire model and all working memory.” For AI agents that need to run for long periods and have persistent memory, this architecture is clearly unsustainable.
The solution is a brand new storage architecture. Jensen Huang revealed his secret weapon: the Inference Context Memory Storage Platform built on BlueField-4 DPU.

He pointed to the massive rack system on stage and explained: “On top of the original 1TB memory for each GPU, we have added an additional 16TB of 'thinking space' for each GPU through this platform.” This platform is placed as close to the computing units as possible and connected with a bandwidth of up to 200Gb/s, avoiding the latency bottleneck caused by traditional storage.
This design directly addresses market concerns about the large-scale deployment of AI applications: Without sufficiently large and fast memory, AI cannot truly become our long-term, personalized assistant.
Physical AI Implementation: No Empty Promises, Q1 Smart Driving Cars Hit the Road
Jensen Huang focused the second part of his speech on a grander theme: “The ChatGPT moment for physical AI has arrived—machines are beginning to understand, reason, and act in the real world.”
To prove that AI can do more than just chat, Jensen Huang unveiled the world's first open-source VLA (Vision-Language-Action) autonomous driving inference model—Alpamayo. Unlike traditional autonomous driving, Alpamayo can "explain" its decisions“This is not just a driving model, but a model that can explain its own thought process.” Jensen Huang played a demonstration video, showing that Alpamayo can not only drive a car but also explain its decision-making logic in natural language, such as “The brake lights of the vehicle ahead are on, it may slow down, so I should keep my distance.”

This “explainable AI” is crucial for solving the long-tail problems of autonomous driving. Huang admitted, “We cannot collect all possible driving scenarios in the world. But we can teach AI to ‘reason’ and break down unfamiliar scenarios into known combinations of elements.”
This technology is about to be commercialized. Huang announced, “The first Mercedes-Benz CLA model equipped with NVIDIA's full-stack DRIVE system will hit the roads in the United States in the first quarter of 2026.” This marks the first complete application of NVIDIA's AI technology in mass-produced vehicles.

Silicon Photonics (CPO) and Warm Water Cooling: Saving 6% of Power for Global Data Centers
In terms of connectivity and heat dissipation, NVIDIA also showcased its dominant technological reserves.
First is the revolution in optical communication. Huang officially launched the Spectrum-6 Ethernet switch (SN688/SN6810) using “Co-Packaged Optics (CPO)” technology.
He clearly stated, “Compared to hardware without silicon photonics technology, they perform better in energy efficiency, reliability, and uptime.” This means that CPO is no longer a concept in the laboratory but has entered NVIDIA's mass production list, marking a substantial moment for the optical module industry chain to transition from pluggable to CPO.
Secondly, there is a reconstruction of energy economics. The high energy consumption of AI has always been a Damocles sword hanging overhead. The Rubin NVL72 rack achieves 100% liquid cooling and supports 45 degrees Celsius inlet water temperature. This means that data centers no longer need high-energy-consuming chillers to produce cold water; they can directly use natural cooling or warm water circulation for heat dissipation. Huang proudly announced that this will save 6% of electricity for global data centers. This is an irresistible temptation for the North American data center market, which is struggling with power quotas.

Industry Alliance: How AI is Changing Trillion-Dollar Traditional Industries
If physical AI only stays in the automotive field, its imaginative space may be limited. But Jensen Huang showcased a broader picture—industrial manufacturing.
“We must design factories that manufacture these AI chips, and these factories themselves are huge robots.” With this logic, Huang shifted the topic to the industrial manufacturing giant—Siemens.

He announced a deep strategic cooperation with Siemens: “We will deeply integrate NVIDIA's physical AI, agent AI models, and Omniverse platform into Siemens' industrial software and digital twin toolchain.”
The scale of this cooperation far exceeds ordinary technology integration. Huang explained: “You will design your chips and systems on these platforms, simulate the entire manufacturing process in the computer, and even complete testing and evaluation before they come into contact with gravity.”
This cooperation marks the comprehensive penetration of NVIDIA's AI technology from data centers into the real economy. When AI can not only generate text and images but also design, simulate, and optimize complex systems in the physical world, its market potential will expand from the internet economy to the global industrial economy.

Open Ecosystem Strategy: How to Respond to the Impact of Open Source Models
In the face of increasingly powerful open-source models, Huang did not shy away but showcased NVIDIA's response strategy—to become a leader in open source rather than a passive responder.
“We are builders of cutting-edge AI models, and we build them in a very special way—completely open.” Huang announced the expansion of NVIDIA's “Open Model Universe,” covering six major fields from biomedicine to the physical world.
He particularly emphasized the industrial activation effect brought by open source: “When open innovation and global collaboration truly kick off, the diffusion speed of AI will be extremely fast.”
NVIDIA's open source is not just a simple release of code, but the opening of a complete toolchain, including training data, model architecture, evaluation tools, etc. The wisdom of this strategy lies in: rather than being disrupted by the open-source community, it is better to actively shape the standards and direction of the open-source ecosystem.
As NVIDIA's technology extends from chips to systems, and from the cloud to the physical world, the company is building not just a computing platform, but a new world infrastructure driven by AI.
The full text of the speech is as follows (translation assisted by AI tools):
Jensen Huang:
Hello, Las Vegas! Happy New Year! Welcome, everyone. Well, we have prepared content equivalent to about 15 keynote speeches to pack into this launch event. It's great to see all of you. There are 3,000 attendees here, with another 2,000 watching in the courtyard outside, and reportedly 1,000 more on the fourth floor watching the live stream from NVIDIA's exhibition area. Of course, millions of viewers around the world are watching online with us as we kick off this new year.
Every 10 to 15 years, the computer industry undergoes a reboot. From mainframes to PCs, to the internet, to the cloud, and then to mobile, new platform shifts continuously occur. Each time, the application world targets a new platform. That’s why it’s called a platform shift. You write new applications for new computers, but this time, there are actually two platform shifts happening simultaneously. As we turn to AI, applications will now be built on top of AI. Initially, people thought AI was just applications. In fact, AI is indeed applications, but you will build applications on top of AI. Beyond that, the way software runs and the way software is developed has fundamentally changed. The entire core stack of the computer industry is being reshaped. You no longer program software; you train it. You don’t run it on CPUs; you run it on GPUs. Previous applications were pre-recorded, pre-compiled, and ran on devices, whereas now applications can understand context and generate every pixel, every token completely from scratch.
It’s always like this. Due to accelerated computing and artificial intelligence, computing has been fundamentally reshaped. Every layer of this five-layer cake is being reinvented. This means that about $10 trillion of computing infrastructure from the past decade is modernizing towards this new way of computing. This means that hundreds of billions of dollars in venture capital are being invested each year into modernizing and inventing this new world. This means that a $100 trillion industry, of which a few percentage points are R&D budgets, is shifting towards artificial intelligence. People ask, where does the money come from? This is where the money comes from. The modernization from traditional IT to AI, the shift of R&D budgets from classical methods to AI methods. Huge investments are pouring into this industry, which also explains why we are so busy. Last year was no exception.
Last year was an incredible year. There’s a slide to show… this is what happens when you go on stage without rehearsing; this is the first keynote of the year. I hope this is also your first of the year. Otherwise, if you’ve been busy before coming here, then never mind. This is our first of the year, and we’re going to clear the cobwebs. 2025 is going to be an incredible year. It seems like everything is happening at the same time. In fact, it may very well be. First, of course, are the Scaling Laws. In 2015, the first language model that I think will really make an impact appeared, and it did have a huge impact; it’s called BERT. In 2017, the Transformer cameUntil five years later in 2022, the moment of ChatGPT occurred. It awakened the world's awareness of the possibilities of artificial intelligence. A year later, something very important happened. The first o1 model from ChatGPT, the first reasoning model, completely revolutionarily invented the concept called “Test-time Scaling,” which is actually very common-sense. We not only pre-train models to let them learn, but we also let them learn skills through reinforcement learning after training. Now we also have test-time scaling, in other words, “thinking,” you are thinking in real-time. Every stage of artificial intelligence requires a lot of computation, and the laws of computation continue to expand. Large language models continue to get better.
At the same time, another breakthrough occurred, which happened in 2024. Agentic systems began to emerge. By 2025, it started to become widespread, almost everywhere. Agent models that can reason, search for information, conduct research, use tools, plan for the future, and simulate outcomes suddenly began to solve very important problems. One of my favorite agent models is called Cursor, which has completely changed the way we do software programming at NVIDIA. Agentic systems will really take off from here.
Of course, there are other types of AI. We know that large language models are not the only type of information. As long as there is information in the universe, as long as the universe has structure, we can teach a large language model, a form of language model, to understand this information, understand its representation, and transform it into AI. The largest and most important category is Physical AI, which understands the laws of nature. Of course, Physical AI is about AI interacting with the world. But the world itself has information, encoded information, which is called AI, Physical AI. In the case of Physical AI, you have AI that interacts with the physical world, and AI that understands physical laws, that is, AI physics.
Finally, one of the most important things that happened last year was the advancement of Open Models. We now know that when open source, open innovation, and the innovation of every company and every industry in the world are activated simultaneously, AI will be everywhere. Last year, open models really took off. In fact, last year we witnessed the advancement of DeepSeek R1, which is the first open model. That was a reasoning system. It surprised the world and truly activated this entire movement. Very exciting work. We are very pleased with this. Now we have various types of open model systems around the world. We now know that open models have also reached the frontier, although still firmly six months behind frontier models, but every six months, a new model will emerge, and for this reason, these models are becoming smarter and smarter. You can see that the download numbers have exploded. The growth in downloads is so rapid because startups want to participate in the AI revolution. Big companies want to participate, researchers want to participate, students want to participate, and almost every country wants to participate
How could digital forms of intelligence possibly leave anyone behind? Therefore, open models truly transformed artificial intelligence last year. The entire industry will be reshaped as a result.
A few years ago, we had this intuition, and you may have heard that we started building and operating our own AI supercomputers, which we call DGX Cloud. Many people asked, are you entering the cloud business? The answer is no. We built these DGX supercomputers for our own use. It turns out we operate supercomputers worth billions of dollars so that we can develop our open models.
I am very excited about the work we are doing. It is starting to attract attention from all over the world and from various industries because we are working on cutting-edge AI models in so many different fields. Our work in proteins and digital biology, Llama Protina, can synthesize and generate proteins. OpenFold 3, to understand the structure of proteins. Evo 2, how to understand and generate multiple proteins. This is also the beginning of cell characterization.
Earth 2, AI that understands the laws of physics. The work we are doing with ForecastNet and Cordiff is truly revolutionizing the way people conduct weather forecasting. NeMo Tron, we are doing groundbreaking work there, the first hybrid Transformer SSM model, which is extremely fast, allowing for long-term thinking or very quick thinking and generating very smart, intelligent answers without taking a long time. NeMo Tron 3 is groundbreaking work, and you can expect us to deliver other versions of NeMo Tron 3 in the near future.
Cosmos, a cutting-edge open-world foundational model, a model that understands how the world works. GR00T, a humanoid robotic system involving joints, mobility, and movement. These models and technologies are now being integrated, and in every case, they are open to the world, with cutting-edge humanoid robotic models open to the world. Today we want to talk a little about Alpamayo, the work we are doing in autonomous vehicles. We not only open-sourced the models but also the data we used to train these models. Because only in this way can you truly trust the source of the models. We open-source all models. We help you create derivatives from them.
We have a complete set of libraries. We call it the NeMo library, the physical NeMo library, and the Clara NeMo library, each BioNeMo library. Each library is an AI lifecycle management system so that you can handle data, generate data, train models, create models, evaluate models, set guardrails for models, and deploy models. Each library is extremely complex and all open-sourced. So now on this platform, NVIDIA is a cutting-edge AI model builder, and we build it in a very special way. We build it entirely in an open environment so that we can empower every company, every industry, and every country to be part of this AI revolution
I am incredibly proud of the work we have done there. In fact, if you pay attention to the trends and charts, the charts show that our contributions to this industry are second to none. In reality, you will see that we will continue to do this, and even accelerate.
These models are also world-class. All systems are down. This has never happened in Santa Clara. Is it because of Las Vegas? Surely someone out there hit the jackpot. All systems are down. Well. I think my system hasn't recovered yet, but that's okay. I will improvise on the go. Not only do these models have cutting-edge capabilities, but they are also open and rank among the best.
This is an area we are very proud of. They rank among the best in intelligence rankings. We have important models capable of understanding multimodal documents, namely PDFs. The most valuable content in the world is captured in PDFs. But it requires artificial intelligence to figure out the content inside, interpret it, and help you read it. So our PDF retriever, our PDF parser is world-class, and our speech recognition models are absolutely world-class. Our retrieval models, essentially the semantic search AI of modern AI era search engines and database engines, are also world-class. So we often rank among the best.
This is an area we are very proud of, and all of this is to serve your ability to build AI agents. This is truly a groundbreaking development area. You know, when ChatGPT first came out, people said, oh my, it produces very interesting results, but the hallucinations are very severe. The reason for hallucinations, of course, is that it can remember everything from the past, but it cannot remember everything from the future or the present. Therefore, it needs to be grounded in research. Before answering questions, it must conduct foundational research. The ability to reason—do I need to do research? Do I need to use tools? How do I break a question down into steps? Each step is something the AI model knows how to do. And when they come together, it can execute things in order that it has never done before and has never been trained to do.
This is the wonderful ability of reasoning. We can encounter situations we have never seen before and break them down into contexts, knowledge, or rules that we know how to handle because we have experienced them in the past. Therefore, the ability of AI models to reason now is extremely powerful, and the reasoning ability of agents opens the door to all these different applications. We no longer need to train an AI model to know everything on day one, just as we do not need to know everything on day one; we should be able to reason how to solve that problem in each case.
Large language models have now achieved this fundamental leap, using reinforcement learning and Chain of Thought, search and planning, and all these different techniques and the capabilities of reinforcement learning, making it possible for us to have this fundamental ability, and it is now completely open source.
But the real breakthrough is another one. I first saw it on Aravind's Perplexity. That search company, the AI search company, is truly innovative and a real company. When I first realized they were using multiple models simultaneously, I thought it was a stroke of genius. Of course, we would do the same.
Of course, AI will also call upon all the great AIs in the world at any part of the reasoning chain to solve the problems it wants to solve. That’s why AI is actually multi-modal, meaning it understands voice, images, text, video, 3D graphics, and proteins. That is multi-modal. It is also multi-model, meaning it should be able to use any model that is best suited for the task. By definition, it is multi-cloud because these AI models are located in all these different places. It is also hybrid cloud because if you are an enterprise company or you built a robot or any device, sometimes it is at the edge, sometimes at a radio cell tower, sometimes on-premises, or maybe in a hospital where you need data to be right there in real-time.
Whatever those applications are, we now know this is what future AI applications will look like. Or think of it another way, because future applications are built on AI. This is the fundamental framework for future applications. This fundamental framework, this basic structure of agent AI that can do what I’m talking about, is multi-model and has now been turbocharged for various AI startups. Now you can also customize your AI, teach your AI skills that others haven’t taught. No one else has made their AI so smart, so intelligent. You can do this for yourself. This is the intent behind all the work we do in NeMo Tron, NeMo, and our open models. You put a smart router in front of it, and that router is essentially a manager that decides which model is best suited for that application, best suited to solve that problem based on the prompts you give it.
So, when you think about this architecture, what do you get? When you think about this architecture, suddenly you have an AI that is completely customized by you on one hand. You can teach it specific skills for your company, those domain-specific things, those things where you have deep domain expertise, and maybe you have all the data needed to train that AI model. On the other hand, your AI is always at the forefront. By definition, you are always at the forefront on one hand, and always customized on the other, and it should just work.
So we think we will create the simplest example to provide you with this complete framework. We call it Blueprint. We have blueprints integrated into global enterprise SaaS platforms, and we are very excited about the progress. But we want to show you a brief example that anyone can do.
Demo Video:
Let's build a personal assistant. I want it to help me manage my calendar, emails, to-do lists, and even take care of my home. I use Brev to turn my DGX Spark into a personal cloud. So I can use the same interface whether I'm using a cloud GPU or DGX Spark. I easily get started with cutting-edge model APIs. I want it to help me manage emails, so I created an email tool for my agent. I want my emails to remain private, so I added an open model that runs locally on Spark.
Now, for any task, I want the agent to use the right model to complete it, so I will use an intent-based model router. This way, prompts that require email will stay on my Spark, while everything else can call the cutting-edge model. I want my assistant to interact with my world, so I connected it to Hugging Face's Richie mini robot. My agent controls Richie’s head, ears, and camera through tool calls.
Jensen Huang:
Call.
Demo Video:
I want to give Richie a voice, and I really like ElevenLabs, so I connected their API.
Richie Robot:
Hi, I'm Richie, running on DGX.
Demo Video:
Hey, Richie, what's on my to-do list today?
Demo Video:
Your to-do list for today: Buy groceries—eggs, milk, butter, and send the new script to Jensen.
Demo Video:
Okay, let's send that update to Jensen. Tell him we'll get it to him by the end of the day.
Demo Video:
Okay.
Demo Video:
Richie, here's a sketch. Can you turn it into an architectural rendering?
Demo Video:
Of course.
Demo Video:
Nice. Now make a video and give me a tour of the room.
Demo Video:
Here you go.
Demo Video:
Awesome. With Brev, I can share access to my Spark and Richie. So I'm planning to share it with Anna.
Demo Video:
Hey, Richie, what’s Potato (the pet dog) doing?
Demo Video:
It's on the couch. I remember you don't like that. I'll call it down. Potato, get off the couch
Demo Video:
With all the advancements in open source, it's incredible to see what you can build. I can't wait to see what you've created.
Jensen Huang:
Isn't this unbelievable? What's surprising now is that it has become trivial. Yet just a few years ago, all of this was impossible, absolutely unimaginable. Well, this basic framework, this fundamental way of building applications—using pre-trained, proprietary cutting-edge language models, combined with customized language models, entering an Agentic Framework, a reasoning framework that allows you to access tools, documents, and even connect to other agents. This is essentially the architecture of AI applications or modern applications.
Our ability to create these applications is very rapid. Note that if you provide it with application information it has never seen before, or presented in a structure that doesn't fully align with what you had in mind, it can still reason and do its best to infer data and information, trying to understand how to solve problems. This is artificial intelligence.
This basic framework is now being integrated. Everything I just described, we are fortunate to collaborate with some of the world's leading enterprise platform companies. For example, Palantir, whose entire AI and data processing platform is being accelerated and integrated by NVIDIA today. ServiceNow, the world's leading customer service and employee service platform. Snowflake, the top cloud data platform, where incredible work is being done. Code Rabbit, which we are using everywhere at NVIDIA. CrowdStrike, creating AI to detect and define AI threats. NetApp, whose data platform now features NVIDIA's semantic AI, an agent system for customer service.
But importantly: this is not just how you develop applications now; it will also become the user interface of your platform. Whether you are Palantir, ServiceNow, Snowflake, or many other companies we collaborate with, the agent system is the interface. No longer filling in boxes with information in Excel, perhaps no longer just command lines. All this multimodal information is now possible, and the way you interact with the platform is more—if you will—simple, just like interacting with a person. This is the enterprise AI revolutionized by agent systems.
Next is Physical AI. This is a field I've been talking about for years. In fact, we've been working on this for eight years. The question is, how do you transform the intelligence inside computers, the intelligence that interacts with you through screens and speakers, into intelligence that can interact with the world, meaning it can understand the common sense of how the world works
Object permanence. If I look away and then look back, that object is still there. Causality. If I push it, it will fall. It understands friction and gravity. It understands inertia. A heavy truck takes longer to stop rolling down the road, while a ball will keep rolling.
These concepts are common sense for a small child, but completely unknown to AI. So we must create a system that allows AI to learn the common sense of the physical world, learn its laws, and of course, be able to learn from data. Data is very scarce, and to be able to assess whether that AI is working means it must simulate in the environment. If AI does not have the ability to simulate the physical world's response to its actions, how does it know whether the actions it is performing are what it should be doing? Simulating the response to its actions is crucial for assessment. Otherwise, there is no way to evaluate it. Every time is different. So this basic system requires three computers. One computer, of course, is the one we know NVIDIA manufactures for training AI models. The other computer is for reasoning models, which is essentially a robotic computer that runs in a car or a robot or a factory, running anywhere at the edge.
But there must be another computer designed for simulation. Simulation is at the core of almost everything NVIDIA does. This is where we are most comfortable; simulation is really the foundation of almost everything we do with physical AI. So we have three computers and multiple stacks running on these computers, and these libraries make them useful. Omniverse is our digital twin, a physics-based simulation world. Cosmos, as I mentioned earlier, is our foundational model, not a language foundational model, but a world foundational model, and it is also aligned with language. You can say something like "What happened to the ball?" and it will tell you the ball is rolling down the street. So it is a world foundational model. Then, of course, there are the robotic models. We have two. One is called GR00T, and the other is called Alpamayo, which I will tell you about now.
One of the most important things we must do for physical AI is to create data to train AI in the first place. Where does the data come from? Instead of creating a large amount of text as a "ground truth" for AI to learn from like we did with language, how do we teach AI the ground truth of physics? There are many, many videos, but it is difficult to capture the diversity and types of interactions we need. So this is where great minds come together to transform computation into data.
Now, using Synthetic Data Generation based on physical laws and ground truth, we can selectively and cleverly generate data that we can use to train AI. For example, the output of a traffic simulator enters the left side of this Cosmos AI world model. Now this traffic simulator is far from sufficient for AI learning. We can put it into the Cosmos foundational model to generate physically-based and physically plausible surrounding videos, from which AI can now learnThere are many examples in this area. Let me show you what Cosmos can do.
“The era of physical AI's ChatGPT is about to arrive.”
Cosmos is the world's leading foundational model, the world's foundational model. It has been downloaded millions of times and is used around the globe, preparing the world for this new era of physical AI. We also use it ourselves. We use it to create our autonomous vehicles for scenario generation and evaluation. We can have something that allows us to effectively drive billions and trillions of miles, but done inside a computer. We have made tremendous progress. Today, we announce Alpamayo, the world's first autonomous vehicle AI that can think and reason.
Alpamayo is end-to-end trained, literally from camera input to execution output. The camera input consists of a large amount of mileage driven by itself or by us humans, using human demonstrations. We also have a large amount of mileage generated by Cosmos. In addition, thousands of examples have been very carefully labeled so that we can teach the car how to drive.
Alpamayo does some very special things. It not only receives sensor inputs and activates the steering wheel, brakes, and acceleration, but it also reasons about the actions it is going to take. It tells you what action it is going to take, it reasons out the rationale for that action, and of course, the trajectory. All of this is directly coupled and trained very specifically by a large amount of human training and data generated by Cosmos. The results are truly incredible. Your car drives not only as you expect but does so so naturally because it learns directly from human demonstrators. But in every scenario, when it encounters a scene, it reasons, tells you what it is going to do, and reasons out what it will do.
Why is this so important? Because of the long tail effect of driving. We cannot simply collect every possible scenario that could happen for every country, every situation, and every possible thing that could occur for all populations. However, each scenario is likely to break down into a large number of other smaller scenarios that are quite normal for you to understand. Therefore, these long tails will be broken down into fairly normal situations that the car knows how to handle; it just needs to reason about them.
Let's take a look. Everything you are about to see is a single pass, no hands on the wheel.
Video demonstration (in-car navigation voice):
Navigating to destination. Please fasten your seatbelt.
(Video playback: autonomous driving process)
Video demonstration:
You have arrived.
Jensen Huang:
We started researching autonomous vehicles eight years ago. The reason is that we had long inferred that deep learning and artificial intelligence would fundamentally reshape the entire computing stack. If we want to understand how to navigate and guide the industry into this new future, we must be good at building the entire stack.
We envision a day when a billion cars on the road will be autonomous. You either own it as a Robotaxi, orchestrating and renting it from someone, or you own it, and it drives itself, or you decide to drive it yourselfHowever, every car will have the capability of autonomous driving, and every car will be powered by AI. Therefore, the model layer in this case is Alpamayo, and the application on top is Mercedes-Benz.
Well, this entire stack is our first full-stack attempt at NVIDIA. We have been committed to this the whole time. I am excited that NVIDIA's first autonomous vehicle will be on the road in the first quarter (Q1) in the United States, and then in the second quarter in Europe, Q1 in the U.S., and Q2 in Europe, and I think Q3 and Q4 will be in Asia. The powerful thing is that we will continue to update it with the next version of Alpamayo and subsequent versions.
I have no doubt that this will be one of the largest robotics industries, and I am glad we are committed to it. It has taught us a lot about how to help other parts of the world build robotic systems, that deep understanding and knowing how to build it ourselves, to build the entire infrastructure, knowing what kind of chips robotic systems need.
In this particular case, the dual Orin chips, the next generation will be dual Thor chips. These processors are designed specifically for robotic systems and are designed for the highest level of safety capabilities. This car has just received its rating. Look, the newly launched Mercedes-Benz CLA has just been rated by NCAP as the safest car in the world.
This is the only system I know of where every line of code, chip, and system has been safety certified. The entire model system is based on us. The sensors are diverse and redundant, and so is the autonomous vehicle stack. The Alpamayo stack is end-to-end trained and has incredible skills. However, unless you drive it forever, no one knows if it is absolutely safe.
So we use another software stack, which is the entire underlying AV stack for barrier protection. The entire AV stack is built to be fully traceable. We spent about five years, actually six or seven years, building the second stack. These two software stacks mirror each other. Then we have a strategy and safety evaluator to determine: Am I confident and can I reason that this can be driven very safely? If so, I will let Alpamayo handle it. If this is a situation I am not very confident about, the safety policy evaluator decides we will revert to a simpler, safer barrier system.
Then it will revert to the classic AV stack, which is the only car in the world that has both AV stacks running simultaneously, and all safety systems should have diversity and redundancy. Our vision is that one day every car, every truck will be autonomous. We have been working towards that future. The entire stack is vertically integrated.
Of course, in the case of Mercedes-Benz, we co-built the entire stack. We will deploy this car and operate and maintain this stack throughout our existence. However, like everything else we are going to do, we built the entire stack, but the entire stack is open to the ecosystem. The ecosystem we are building for L4 and Robotaxi is expanding and is spread all overI fully expect this to be—this is already a huge business. This is a huge business for us because they use it to train, process data, and train their models. They use it for synthetic data generation. In some cases, some companies are almost exclusively building in-car computer chips. Some companies work with us in a full-stack collaboration, while others work with us in a partial collaboration. It doesn't matter how much you decide to use; my only request is to use a little more NVIDIA if possible.
This is all the open content right now. This will be the first large-scale mainstream AI, physical AI market. I think we all completely agree here that the turning point from non-autonomous vehicles to autonomous vehicles may happen around this time. Over the next 10 years, I am quite certain that a large proportion of cars in the world will be autonomous or highly autonomous.
But the basic technology I just described, using three computers, synthetic data generation, and simulation, applies to every form of robotic system. It could be a robot that is just joints and mechanical arms, perhaps a mobile robot, or maybe a fully humanoid robot. So the next journey, the next era of robotic systems will be robotics. These robots will come in various different sizes. I invited some friends. Did they come? Hey, guys, hurry up. I have a lot to talk about. Come on, hurry up. Did you tell R2D2 you would be here? C-3PO. Alright. Alright. Come over here.
(Robots come on stage)
Now, there is something very—you have Jetsons. They have small Jetson computers inside them. They are trained within the Omniverse. How about that? Let’s show everyone how you learn to be a robot simulator; do you want to see that? Okay, let’s take a look at that. Rana, please.
(Video demonstration: Robot training in Omniverse)
This is amazing. That’s how you learn to be a robot. You completed it within the Omniverse, and the robot simulator is called Isaac Sim, Isaac Lab. Anyone who wants to make robots—even if no one will be as cute as you guys—but now we have all these friends, and we are making robots.
We have big manufacturing. No, just like I said, no one is as cute as you guys. But we have Neuro Bot, we have Ag Bot. There’s the AG Bot over there. We have LG over here. They just released a new robot. Caterpillar, they have the largest robot in history. That one delivers food to your home, connected to Uber Eats. That’s the Serve Robot; I love those guys. Agility, Boston Dynamics. Incredible. You have surgical robots, you have robotic arms from Franka, you have robots from Universal Robotics. An astonishing variety of different types of robots
So this is the next chapter. We will talk more about robotics in the future.
But ultimately, it's not just about robots. I know it's all about you guys. The key is to get there. One of the most important industries in the world will be completely revolutionized by physical AI and AI physics, which is also the origin of NVIDIA. Without the companies I’m going to talk about, NVIDIA would not exist. I’m excited that all these companies, starting with Cadence, are accelerating everything. Cadence is integrating CUDA-X into all their simulators and solvers.
They have NVIDIA physical AI, which they will use for different physical factories and factory simulations. You have AI physics integrated into these systems. So whether it's EDA or CAE, and future robotic systems, we will basically have the same technology that makes you possible, now completely transforming these design stacks. Synopsys, without Synopsys, you know, Synopsys and Cadence are absolutely indispensable in the chip design world. Synopsys leads in logic design and IP.
In the case of Cadence, they lead in physical design, layout, routing, simulation, and verification. Cadence is incredible in simulation and verification. They are all entering the world of system design and system simulation. So in the future, we will design your chips inside Cadence and Synopsys. We will design your systems within these tools and simulate the entire process, simulating everything.
That is your future. Yes, you will be born within these platforms. It's amazing, right? So we are excited that we are working with these industries, just like we are integrating NVIDIA into Palantir and ServiceNow, we are integrating NVIDIA into the most compute-intensive simulation industries—Synopsys and Cadence.
Today we announce that Siemens is doing the same thing.
We will integrate CUDA-X, physical AI, Agentic AI, NeMo, NeMo Tron, deeply into the world of Siemens. The reason is this. First, we design chips, and in the future, all chips will be accelerated by NVIDIA. You will be very happy about this. We will have agent chip designers and system designers working with us to help us with designs, just like we have agent software engineers helping our software engineers code today.
So we will have agent chip designers and system designers. We will create you internally. But then we have to build you, we have to build factories, and manufacture your factories. We have to design and assemble the production lines for all of you
These manufacturing plants will essentially be huge robots. Incredible. Is that right? I know. So you will design in the computer. You will manufacture in the computer, you will test and evaluate in the computer, long before you have to spend any time dealing with gravity.
Do you know how to deal with gravity? (robot jumps) Okay, don't show off.
So, this is what makes NVIDIA possible in the industry. I am excited that the technology we are about to create has reached such a complex level and capability that we can go back and help them provide solutions for the industry. So, starting from what they have, we now have the opportunity to go back and help them completely transform their industry.
Let's see what we will do with Siemens. Come on.
Video Commentary:
Breakthroughs in physical AI are bringing AI from the screen into our physical world. This is timely, as the world is building a variety of factories for chips, computers, life-saving drugs, and AI. With the global labor shortage intensifying, we need automation driven by physical AI and robotics more than ever.
This is where AI meets the largest physical industries in the world, forming the basis for NVIDIA and Siemens' collaboration. For nearly two centuries, Siemens has built the world's industries, and now it is reshaping them for the AI era. Siemens is integrating NVIDIA's CUDA-X libraries, AI models, and Omniverse into its portfolio of EDA, CAE, and digital twin tools and platforms. We are bringing physical AI into the entire industrial lifecycle from design and simulation to production and operation. We are standing at the beginning of a new industrial revolution—the era of physical AI. Built for the next industrial age by NVIDIA and Siemens.
(Vera Rubin platform release section)
Jensen Huang:
Incredible, right, guys? What do you think? Okay, hold on tight. If you look at the model of the world, there is no doubt that OpenAI is the leading token generator today. OpenAI generates more tokens than anyone else. The second largest group, the second largest might be open models. My guess is that over time, because there are so many companies, so many researchers, and so many different types of fields and modalities, open-source models will be the largest by far.
Let's talk about a very special person. Do you want to do that?
Let's talk about Vera Rubin. She was an American astronomer. She was the first to observe and notice that the speed of the tails of galaxies was nearly as fast as the centers of the galaxies. I know this doesn't make sense. Newtonian physics would say that, like in the solar system, planets further from the sun orbit slower than those closer to the sun. Therefore, unless there are invisible objects, this doesn't make sense. She discovered dark matter, which we cannot see but occupies space
So Vera Rubin is the person we named our next computer after. It's a good idea, right? I know.
The design of Vera Rubin is aimed at addressing this fundamental challenge we face. The computational demands for AI are skyrocketing. The demand for NVIDIA GPUs is soaring. This surge is due to models increasing tenfold each year, which is an order of magnitude. Not to mention, as I mentioned, the introduction of o1 is a turning point for AI. Inference is no longer a one-time answer but a thinking process. To teach AI how to think, reinforcement learning and significant computation have been introduced into post-training. It is no longer just supervised fine-tuning (SFT) or imitation learning. Now with reinforcement learning, essentially the computer tries different iterations on its own, learning how to perform tasks. Therefore, the amount of computation during pre-training, post-training, and testing has exploded.
Now, every inference we make is no longer just a one-time event; the number of tokens— you can see AI thinking, and we appreciate that. The longer it thinks, the better the answers it usually produces. Thus, the computational expansion during testing leads to the number of generated tokens increasing fivefold each year. Meanwhile, the AI race is on. Everyone is trying to reach the next level. Everyone is trying to reach the next frontier. Whenever they reach the next frontier, the cost of the previous generation AI tokens starts to decrease by about tenfold each year. A tenfold decrease each year actually tells you something different; it indicates that the competition is so fierce, and everyone is trying to reach the next level, and someone is reaching the next level. Therefore, all of this is a computational problem. The faster you compute, the faster you can reach the next frontier level. All of these things are happening at the same time.
So we decided that we must advance the state of the art in computation every year, without falling behind even for a year. We started shipping GB200 a year and a half ago. Now, we are fully manufacturing GB300. If Vera Rubin is to catch up this year, it must have already gone into production. So today, I can tell you that Vera Rubin is in full production.
Do you want to see Vera Rubin? Okay, let's go. Please play.
Video Commentary:
Vera Rubin just happens to catch the next frontier of AI. This is the story of how we built it. The architecture, a six-chip system engineered as a whole. Born from Extreme Co-design. It started with Vera, a custom-designed CPU that is twice as powerful as the previous generation. And the Rubin GPU. Vera and Rubin were co-designed from the beginning to share data bidirectionally and consistently, faster and with lower latency.
AI requires fast data. ConnectX-9 provides 1.6 TB/s of scalable bandwidth for each GPU, while the BlueField-4 DPU offloads storage and security. This allows computation to focus entirely on AI. The Vera Rubin compute tray is completely redesigned, with no cables, hoses, or fans, equipped with one BlueField-4 DPU, eight ConnectX-9 network cards, two Vera CPUs, and four Rubin GPUs, which are the building blocks of the Vera Rubin AI supercomputer. Next, the sixth-generation NVLink Switch moves more data than the global internet, connecting 18 compute nodes and scaling to 72 Rubin GPUs operating as a whole. Then there is the Spectrum-6 Ethernet Photonics, the world's first Ethernet switch with 512 channels and 200Gbps "co-packaged optics."
Expanding thousands of racks into an AI factory. 15,000 engineer years since the design began. The first Vera Rubin NVL72 rack is online. Six groundbreaking chips, 18 compute trays, 9 NVLink switch trays, 220 trillion transistors, weighing nearly 2 tons. A huge leap towards the next frontier of AI. Rubin is here.
Jensen Huang:
What do you think? This is a Rubin Pod. 1152 GPUs and 16 racks. As you know, each rack has 72 Vera Rubin or 72 Rubin. Each Rubin is actually two GPU dies connected together.
"It's a giant ship."
We designed six different chips. First, we have a rule in the company. As a good rule, the next generation should not have more than one or two chip variations. But the problem is, as you can see, we are describing the total number of transistors in each described chip. We know that Moore's Law has significantly slowed down. Therefore, the number of transistors we can get each year cannot keep up with models that are 10 times larger. It cannot keep up with the fact that Token generation is increasing by 5 times each year. It cannot keep up with the fact that the cost of Tokens is decreasing so aggressively that if the industry is to continue to progress, we must deploy aggressive extreme collaborative design, innovating simultaneously across all chips in the entire system, otherwise it is impossible to keep up with this pace. That is why we decided that this generation, we had no choice but to redesign every chip.
Each chip described just now could hold a press conference by itself, which in the past might have required an entire company dedicated to it. Each one is completely revolutionary and the best in its class
Vera CPU, I am proud of it. In a world limited by power consumption, the performance of the Vera CPU is twice that of the world's most advanced CPUs. It has 88 CPU cores but uses "Spatial Multi-threading" technology, allowing each of the 176 threads to achieve full performance.
This is the Rubin GPU. Its floating-point performance is five times that of Blackwell. But importantly, looking at the bottom line, it has only 1.6 times the number of transistors of Blackwell.
I want to tell you about the current level of semiconductor physics. How could we possibly deliver this level of performance without collaborative design, without extreme collaborative design at the chip level of basically the entire system? Because you only have 1.6 times the transistors. Even if the performance of each transistor improves a little, say by 25%, it is impossible to achieve 100% yield from these transistors. So 1.6 times somewhat sets the upper limit for performance improvement each year, unless you do something extreme, which we call extreme collaborative design.
One thing we did, which is also a great invention, is called NVFP4 Tensor Core. The Transformer engine inside our chip is not just some 4-bit floating-point number that we put into the data path. It is a complete processor, a processing unit that knows how to dynamically and adaptively adjust its precision and structure to handle different levels of the Transformer. This way, you can achieve higher throughput where precision may be lost and return to the highest possible precision where needed. That kind of dynamic adaptability cannot be done in software because it obviously runs too fast. So you have to achieve it adaptively inside the processor.
This is NVFP4. When someone mentions FP4 or FP8, it means almost nothing to us. The reason is that it concerns the structure of the Tensor Core and all the algorithms that make it work. NVFP4, we have published related papers. The throughput and precision levels it can retain are completely incredible. This is groundbreaking work. If the industry hopes we can turn this format and structure into an industry standard in the future, I wouldn't be surprised. This is completely revolutionary. This is why we can provide such a huge performance boost, even with only 1.6 times the transistors.
We have completely transformed the entire NGX chassis. From two hours of assembly time to five minutes. 100% liquid cooling. A real breakthrough. Okay, so this is the new computing chassis, connecting all of this to the top rack switch, for east-west traffic, called Spectrum-X NIC.
Okay, this is the new computing chassis. Connecting all of this to the top rack switch (east-west traffic) is the Spectrum-X Network Card. This is undoubtedly the best network card in the world. NVIDIA's Mellanox, which joined us a long time ago, has the best networking technology for high-performance computing in the world,Unmatched. Algorithms, chip design, all the interconnections, and all the software stacks running on top. Their RDMA is absolutely the best in the world. Now it has the capability for programmable RDMA and data path accelerators. This way, our partners (like AI labs) can create their own algorithms for how they want to move data within the system, and this is truly world-class.
ConnectX-9 and Vera CPU are co-designed. We only released it when CX9 came out because we were co-designing it for a new type of processor. You know, ConnectX-9, our CX8, and Spectrum-X have completely transformed the way Ethernet is used for artificial intelligence.
AI Ethernet traffic is much denser and requires lower latency. Instantaneous traffic spikes are unlike anything Ethernet has seen before. So we created Spectrum-X, which is AI Ethernet. Two years ago, we launched Spectrum-X. NVIDIA is now the largest networking company in the world.
It is so successful and used in so many different installations. It is sweeping across the AI field. The performance is incredible, especially when you have a 200-megawatt data center or a gigawatt-level data center, which are billion-dollar investments. Assume a gigawatt data center is worth $50 billion; if network performance gives you an additional 10% output, and in the case of Spectrum-X, providing 25% higher throughput is not uncommon. Even if we only provide 10%, that’s worth $5 billion. Then the network is completely free, which is why everyone is using Spectrum-X. It really is an incredible thing.
Now we are going to invent a new way of data processing. So Spectrum-X is for east-west traffic. We now have a new processor called BlueField-4, which allows us to take a very large data center and isolate its different parts so that different users can use different parts, ensuring everything can be virtualized if they decide to virtualize. So you offload a lot of virtualization software, security software, and north-south traffic network software.
BlueField-4 is standard for every compute node. BlueField-4 also has a second application that I will talk about shortly. This is a revolutionary processor, and I am very excited about it.
This is the NVLink 6 Switch, right here. Each switch chip inside this switch has the fastest SerDes in history. The world has just reached 200 Gbps. This is a 400 Gbps switch. This is so important because it allows every GPU to communicate with every other GPU at exactly the same time
The switches located on these rack backplanes allow us to move data equivalent to twice the global internet data volume at twice the speed of all the world's internet data. The cross-sectional bandwidth of the entire planet's internet is calculated to be about 100 TB per second. This is 240 TB per second. So this gives everyone a concept. This is to ensure that every GPU can work with every other GPU at exactly the same time.
Okay, on top of that—this is a single rack. This is one rack. As you can see, the number of transistors in this one rack is 1.7 times. Yes, can you help me with this? This usually weighs about 2 tons, but today it is 2.5 tons. Because when they transported it, they forgot to drain the water inside. So we brought a lot of water from California.
Can you hear it scream? When you spin 2.5 tons, it definitely screams a bit. You can do it. Okay. We won't make you do it twice.
Okay, behind this is the NVLink Spine, basically a two-mile-long copper cable. Copper is the best conductor we know. These are shielded copper cables, structured copper cables, the most used in computing systems ever. Our SerDes drives copper cables from the top of the rack all the way to the bottom, at a speed of 400 Gbps. Incredible.
There are a total of two miles of copper cables, 5,000 copper cables. This makes the NVLink Spine possible. This truly revolutionizes the NGX system, and we decided to create an industry-standard system so that the entire ecosystem, all our supply chains can standardize these components. There are about 80,000 different components that make up these NGX systems.
If we change it every year, that would be a complete waste. Every major computer company, from Foxconn to Quanta, to Wistron, the list goes on, to HP, Dell, and Lenovo. Everyone knows how to build these systems. So despite the much higher performance, and importantly, the power is twice that of the original, we can still fit Vera Rubin in. The power of Vera Rubin is twice that of Grace Blackwell.
However, this is where the miracle lies—the air and airflow entering are roughly the same. It is very important that the water entering is at the same temperature, 45 degrees Celsius. With water at 45 degrees Celsius, the data center does not need chillers. We are essentially cooling this supercomputer with hot water, which is extremely efficient.
So this is the new rack, with the number of transistors increased by 1.7 times, but peak inference performance increased by 5 times, and peak training performance increased by 3.5 times. Okay, they use Spectrum-X connections at the top. Oh, thank you.
This is the world's first chip made using TSMC's new process, which is our co-innovated process called Coupe, a Silicon Photonics integrated process technologyThis allows us to connect silicon photonics directly to the chip. This is 512 ports, with a speed of 200 Gbps. This is the new Ethernet AI switch, the Spectrum-X Ethernet switch.
Look at this huge chip. But what’s truly amazing is that it connects directly to silicon photonics, with lasers coming in. The laser enters from here. The optical devices are here, and they connect to the rest of the data center. I will show this later, but this is right at the top of the rack. This is the new Spectrum-X silicon photonics switch.
Okay, I have something new to tell you. As I mentioned a few years ago, we introduced Spectrum-X so that we could reshape the way networks operate. Ethernet is very easy to manage, everyone has an Ethernet stack, and every data center in the world knows how to handle Ethernet. At that time, the only thing we used was InfiniBand for supercomputers. InfiniBand has very low latency, but of course, its software stack and overall manageability are very unfamiliar to those using Ethernet. So we decided to enter the Ethernet switch market for the first time. That’s how Spectrum-X took off, making us the largest networking company in the world.
As I mentioned, the next generation of Spectrum will continue this tradition. As I said before, AI has reshaped the entire computing stack, every layer of the computing stack. Naturally, as AI begins to be deployed in global enterprises, it will also reshape the way we perform storage. AI does not use SQL. AI uses semantic information.
When using AI, it creates this temporary knowledge, temporary memory, called KV Cache, which is a Key-Value combination. This is a KV cache, essentially the cache of AI, the working memory of AI. The working memory of AI is stored in HBM memory.
For each Token, the GPU reads the model, the entire model, it reads the entire working memory and produces a Token, and stores that one Token back in the KV cache. Then the next time it does this, it reads the entire memory, reads it, and streams it through our GPU, then generates another Token. Well, it repeats this, one Token after another.
Obviously, if you have a long conversation with that AI, over time, that memory, that contextual memory will grow significantly, not to mention the model is growing, and the number of AI dialogue rounds we are using is increasing. We hope this AI can accompany us throughout our lives and remember every conversation we have with it, right? Every research link I ask it to do. Of course, the number of people sharing supercomputers will continue to grow. Therefore, this initial contextual memory that fits into HBM is no longer large enoughLast year, we created a very fast memory for Grace Blackwell, which we call fast context memory. This is why we connected Grace directly to Hopper. This is why we connected Grace directly to Blackwell, so we can expand the context memory. But even that is not enough. So the next solution is, of course, to go to the network, the north-south network to the company's storage. But if you run a lot of AIs simultaneously, that network will no longer be fast enough. So the answer is obviously to do something different.
So we introduced BlueField-4 so that we can have very fast KV cache context memory storage in the rack. I will show you right away that this is a whole new category of storage system. The industry is very excited because this is a pain point for almost everyone doing a lot of token generation today. AI labs, cloud service providers, they are really suffering from the massive network traffic caused by KV cache movement.
So creating a new platform, a new processor to run the entire Dynamo KV cache context memory management system and placing it very close to the rest of the rack is a completely revolutionary idea.
Here it is. Right here. This is all the compute nodes. Each one is NVLink 72. So this is Vera Rubin, NVLink 72, 144 Rubin GPUs. This is the context memory stored here. Each one has four BlueFields behind it. Each BlueField has 150 TB of memory, context memory. For each GPU, once you allocate it, each GPU will get an additional 16 TB. Now inside this node, each GPU essentially has 1 TB. Now with this backup storage directly on the same east-west traffic, at the exact same data rate, 200 Gbps across the entire architecture of this compute node, you will get an additional 16 TB of memory. Okay? This is the management plane. These are the Spectrum-X switches connecting all of them. Over here, the end switches connect them to the rest of the data center. Okay, this is Vera Rubin.
There are a few things that are really incredible. First, I mentioned that the energy efficiency of this entire system is twice that of the original, essentially twice the thermal performance, meaning that even if the power is twice, the energy used is twice, the amount of computation is many times higher than that. But the liquid going into it is still at 45 degrees Celsius. This allows us to save about 6% of global data center power. This is a very big deal.
The second very big thing is that the entire system is now "Confidential Computing Safe," meaning everything is encrypted in transit, at rest, and during computationAnd every bus is now encrypted. Every PCIe Express, every NVLink, the 8 NVLinks between CPU memory and GPU, and between GPU to GPU, everything is now encrypted. So it is confidential computing secure. This allows companies to feel secure that their models are deployed by others but will never be seen by others.
Okay? So this particular system is not only extremely energy-efficient, but there is one incredible thing. Due to the nature of AI workloads, it will spike instantly with a computational layer called All-Reduce, and the current, the energy used simultaneously is really off the charts. Typically, they spike by 25%. We now have Power Smoothing across the entire system, so you don’t have to over-provision. Or if you over-provisioned, you don’t have to waste that 25% of energy or leave it idle. So now you can fill the entire power budget without over-provisioning.
The last thing, of course, is performance. Let’s take a look at the performance of this. These charts are only appreciated by those who build AI supercomputers. We put a lot of effort into redesigning every chip, every system, and rewriting the entire stack to make this possible. Basically, this is training AI models.
In the first column, the faster you train the AI model, the faster you can push the next frontier to the world. This is your time to market. This is technological leadership. This is your pricing power. If it’s green, that’s Blackwell. In the case of Rubin, the throughput is much higher. Therefore, only a quarter of these systems are needed to train the model within our given time (one month). This is training a 100 trillion parameter model on 100 trillion tokens. Okay, this is our simulated prediction of what is needed to build the next frontier model. Elon has already mentioned that the next version of Grok could be 7 trillion parameters. So this is 100 trillion. The green one is Blackwell. Here in Rubin, note that the throughput is much higher.
The second part is your Factory Throughput. Blackwell is again green, and factory throughput is important because your factory, in the case of gigawatts, is $50 billion. A $50 billion data center can only consume 1 gigawatt of power. So if your performance, your throughput per watt is good compared to bad, this directly translates to your data center revenue directly related to the second column. In the case of Blackwell, it is about 10 times that of Hopper. In the case of Rubin, it will again be about 10 times higher.
And in terms of token cost, Rubin is about one-tenth of Blackwell
This is how we get everyone to the next frontier, pushing AI to the next level, and of course, building these data centers in an energy-efficient and cost-effective way.
This is NVIDIA now. You know we make chips, but as you know, NVIDIA now makes entire systems, and AI is full-stack. We are reshaping everything about AI, from chips to infrastructure, to models, to applications. Our job is to create the entire stack so that all of you can create incredible applications for the rest of the world.
Thank you all for coming. Have a great CES

