RTX 5090 makes a stunning debut, the world's smallest AI supercomputer to be launched in May, the era of "physical AI" begins... Jensen Huang's full speech at the 2025 CES conference

Wallstreetcn
2025.01.07 11:51
portai
I'm PortAI, I can summarize articles.

Universal Robots "ChatGPT Moment" is just around the corner, AI supercomputers are moving to the desktop, and physical AI will completely transform the $50 trillion manufacturing and logistics industries. Everything that moves—from cars and trucks to factories and warehouses—will be realized by robots and AI!

On January 7th, Beijing time, Jensen Huang, founder and CEO of NVIDIA, appeared at the Las Vegas CES exhibition wearing a new Tom Ford jacket worth $65,000, delivering the opening keynote speech and launching a series of new products and technologies.

The main highlights of the press conference are as follows:

Launched the next-generation GPU RTX 5090 based on the Blackwell architecture, with the high-end model RTX 5090 featuring 92 billion transistors, providing 3400 TOPS computing power and 4000 AI TOPS (trillion operations per second) performance, priced at $1999.

The prices for RTX 5070, RTX 5070 Ti, RTX 5080, and RTX 5090 are $549 (approximately 4023 yuan), $749 (approximately 5489 yuan), $999 (approximately 7321 yuan), and $1999 (approximately 14651 yuan), respectively. Among them, the RTX 5070 has the same performance as the previously priced $1599 RTX 4090, representing a price reduction of one-third.

Introduced the latest key interconnect technology NVLink72 based on the Blackwell architecture. The number of transistors reaches 130 trillion, with 72 Blackwell GPUs providing 1.4 ExaFLOPS TE FP4 computing power and featuring 2592 Grace CPU cores.

"The scaling law continues": the first scaling law is pre-training; the second scaling law is post-training; the third scaling law is computation during testing.

Demonstrated Agentic AI with "Teat-Time Scaling" capabilities, supporting tools such as calculators, web searches, semantic searches, SQL searches, and even generating podcasts.

Launched the Nemotron model, including the Llama Nemotron large language model, which is divided into three tiers: Nano, Super, and Ultra.

AI agents could be the next robotics industry, representing a multi-trillion dollar opportunity.

Introduced the physical AI foundational model Cosmos, which is open-source and commercially usable. This model can convert images and text into actionable tasks for robots, seamlessly integrating visual and language understanding to perform complex actions.

Announced generative AI models and blueprints, further expanding the integration of NVIDIA Omniverse into physical AI applications such as robotics, autonomous vehicles, and visual AI.

Physical AI will fundamentally change the $50 trillion manufacturing and logistics industries, with everything that moves—from cars and trucks to factories and warehouses—being realized by robots and AI.

Released the world's smallest personal AI supercomputer—Project Digits. This supercomputer is equipped with the new Grace Blackwell super chip, supporting individuals to directly run large models with 200 billion parameters, and two Project Digits can run a large model with 405 billion parameters The following is the full text of Jensen Huang's speech:

It All Started in 1993

Welcome to CES! Are you all having fun in Las Vegas? Do you like my jacket? (Editor’s note: $8,990!)

I think my speaking style should be different from Gary Shapiro _ (CEO of CTA, President of CES) _, after all, I am in Las Vegas. If this doesn’t work, and if you all oppose it, then… you’ll just have to get used to it. In about an hour, you’ll find it’s not so bad.

Welcome to NVIDIA—actually, you are now in NVIDIA's digital twin—ladies and gentlemen, welcome to NVIDIA. You are in our digital twin, and everything here is generated by AI.

This has been an extraordinary journey, an extraordinary year, and it all started in 1993.

With the NV1 _ (NVIDIA's first GPU) _, we wanted to create computers that could do things that ordinary computers could not. The NV1 successfully made it possible to play console games on a computer, and our programming architecture was called UDA (Unified Device Architecture), which was soon named “UDA Unified Device Architecture.”

The first application I developed on UDA was Virtua Fighter. Six years later, in 1999, we invented the programmable GPU, and since then, this incredible processor has made astonishing progress for over 20 years. It has made modern computer graphics possible.

Thirty years later, Virtua Fighter has been completely adapted into film. This is also our upcoming new Virtua Fighter project, and I can't wait to tell you about it; it's stunning.

Another six years later, we invented CUDA. With it, we were able to explain or express the programmability of the GPU, which also allowed me to benefit from a rich set of algorithms. Initially, it was difficult to explain, and it took several years—actually about six years Somehow, six years later, in 2012, Alex Kirshevsky, Elias Susker, and Jeff Hinton discovered CUDA and used it to process AlexNet, which now seems like a part of history.

Today, AI is advancing at an incredible pace. We started with perceptual AI, which can understand images, words, and sounds, then generative AI, which can generate images, text, and sounds, and now we have AI agents that can perceive, reason, plan, and act, followed by the next phase, physical AI, part of which we will discuss tonight.

In 2018, something very magical happened. Google released the Bidirectional Encoder Representations from Transformers (BERT), and the world of artificial intelligence truly took off.

As you know, transformers completely changed the landscape of artificial intelligence. In fact, they fundamentally altered the landscape of computing. We rightly recognize that artificial intelligence is not just a new application and business opportunity; more importantly, machine learning driven by transformers will fundamentally change the way computing works.

Today, computing has undergone a revolution at every level, from manually writing instructions that run on CPUs to creating software tools used by humans. We now have machine learning, which creates and optimizes neural networks, processes and creates artificial intelligence on GPUs, and every layer of the technology stack has undergone a radical change, with incredible transformations occurring in just 12 years.

Now, we can understand information in almost any modality. Of course, you have seen things like text, images, and sounds, but we can understand not only these but also amino acids, physics, and more. We not only understand them but can also translate and generate them. The applications are virtually endless.

In fact, for almost every AI application you see, if you ask these three basic questions: What is the form of the input? What form of information do I learn from? What form of information does it translate into? What form of information does it generate? Almost every application can provide an answer Therefore, when you see one AI-driven application after another, the core of it all is this basic concept.

Machine learning has changed the way every application is built, changed the way computing is done, and expanded the possibilities.

Now, everything related to AI is built on the GeForce _ (the brand of graphics processors for personal computers developed by NVIDIA) _ architecture, which has made artificial intelligence accessible to the masses. Now, AI is returning to the embrace of GeForce, and there are many things that cannot be done without AI. Let me show you.

_ (demonstration video) _

That is real-time computer graphics; no computer graphics researcher or scientist would tell you that ray tracing can now be done for every pixel. Ray tracing is a technique that simulates light, and the magnitude of the geometric shapes you see is absolutely insane. Without AI, this would be nearly impossible.

We did two fundamental things. Of course, we used programmable shading and ray tracing acceleration to generate incredibly beautiful pixels.

But then we let AI condition and control based on these pixels to generate a vast number of other pixels because it knows what the colors should be and has been trained on NVIDIA's supercomputers. Therefore, the neural networks running on the GPU can infer and predict our unrendered pixels.

Not only can we do this, but it is called DLSS (Deep Learning Super Sampling). The latest generation of DLSS can even go beyond frames and predict the future, generating three frames for every frame computed.

For example, if what you are seeing now is a four-frame scene, it consists of one frame rendered by us and three additional generated frames.

If I set four frames at full HD 4K, that is about 33 million pixels. Out of these 33 million pixels, we calculated 2 million pixels using programmable shaders and our ray tracing engine, and let AI predict all the other 33 million pixels—this is truly a miracle.

Therefore, we can render with extremely high performance because AI reduces a significant amount of computation. Of course, training it requires immense computing power, but once the training is complete, the generation process is extremely efficient This is an incredible capability of AI, which is why so many amazing things are happening. We utilize GeForce to realize AI, and now AI is revolutionizing GeForce.

The latest GPU from the Blackwell family! The RTX 50 series chips are here to shock

Everyone, today we are here to announce the next generation of the RTX Blackwell family. Let's take a look.

(Demo video)

Look, this is our brand new GeForce RTX 50 series chip based on the Blackwell architecture.

This GPU is truly a "beast," with 92 billion transistors and 4000 TOPS _ (trillions of operations per second) _ of AI performance, which is three times that of the previous generation Ada architecture.

To generate the pixels I just showcased, we also need these:

  • 380 RT TFLOPS (trillions of floating-point operations per second) of ray tracing performance, so we can compute the most beautiful images;
  • 125 Shader TFLOPS (shader units) of shader performance, with parallel shader teraflops and a comparably powerful internal drift unit, thus having two dual shaders, one for floating-point operations and one for integer operations;
  • And G7 memory from Micron, with a bandwidth of 1.8TB per second, which is double that of our previous generation, allowing us to mix AI workloads with computer graphics workloads.

One astonishing aspect of this generation is that programmable shaders can now also handle neural networks. Therefore, shaders can carry these neural networks, resulting in our invention of neural texture compression and neural material shading.

With all of this, you will get these stunningly beautiful images, which can only be achieved by using AI to learn textures and compression algorithms, resulting in extraordinary outcomes This is the brand new RTX Blackwell 50 series, and even the mechanical design is a marvel. Look, it has two fans, and the entire graphics card is practically a giant fan. So the question arises, is the graphics card really that big? In fact, the conventional voltage design is state-of-the-art, and this GPU has an incredible design; the engineering team did a great job, thank you.

Next up is speed and cost. How does it compare? This is the RTX 4090. I know many of you have this graphics card. Its price is $1,599, definitely one of the best investments you can make. For just $1,599, you can bring it back to your $10,000 "PC entertainment center."

Right? Don't tell me I'm wrong. This graphics card features a liquid cooling design with gorgeous lighting all around. You lock it up when you leave; this is a modern home theater, completely reasonable.

And now, with the RTX 5070 from the Blackwell family, you only need to spend $549 to achieve this and enhance your configuration and performance.

None of this would be possible without artificial intelligence; the four top-tier fourth-order calculations wouldn't be possible without AI tensor cores, nor would it be possible without G7 memory.

Well, this is the entire RTX 50 family, from RTX 5070 all the way to RTX 5090, the latter's performance is twice that of the 4090. We will begin mass production in January.

This is indeed incredible, but we successfully installed these GPUs into laptops.

This is an RTX 5070 laptop priced at $12,909, and its performance is equivalent to the 4090.

Can you imagine? Shrinking this incredible graphics card and putting it inside, does that make sense? There's nothing AI can't do.

The reason is that we generate most of the pixels through our testing. Therefore, we only track the necessary pixels, while the rest are generated by AI. The result is that energy efficiency is simply incredible. The future of computer graphics is neural rendering, which is the combination of artificial intelligence and computer graphics What is truly surprising is that we are about to incorporate the current GPU family into computers. The RTX 5090 is suitable for placement in a thin laptop with a thickness of 14.9 mm.

So, ladies and gentlemen, this is the RTX Blackwell family.

The new Scaling law has emerged, allowing models to train themselves and apply different resource allocations

GeForce has brought artificial intelligence (AI) to the world, popularizing it. Now, AI has come back to fundamentally change GeForce, so let's talk about AI.

The entire industry is racing to catch up and expand AI, and the Scaling law is a powerful model, an empirical rule proven through generations of researchers and industry observers.

The Scaling law indicates that the larger the amount of training data, the larger the model, and the more computational power invested, the more effective or powerful the model becomes. Thus, the Scaling law continues.

Surprisingly, the amount of data generated by the internet each year is about twice that of the previous year. I believe that in the coming years, the amount of data generated by humanity will exceed the total amount of data generated by all of humanity throughout history.

We are still continuously generating vast amounts of data, which exhibit multimodal characteristics, including video, images, and sound. All this data can be used to train the fundamentals of AI.

However, in reality, two new Scaling laws have emerged, which are somewhat intuitive.

The second Scaling law is the "post-training Scaling law."

The post-training Scaling law utilizes techniques such as reinforcement learning and human feedback. Essentially, AI generates answers based on human queries, and then humans provide feedback. It is much more complex than this, but this reinforcement learning system continuously enhances AI skills through a large number of high-quality prompts.

It can be fine-tuned for specific domains, such as becoming better at solving math problems and reasoning.

So, this is essentially like having a mentor or coach giving you feedback after school. You take exams, receive feedback, and then self-improve. We also use reinforcement learning, AI feedback, and synthetic data generation, which are similar to self-practice, where you know the answer to a question and keep trying until you get it right.

Thus, AI can face a complex and difficult problem that is functionally verifiable and has answers we understand, such as proving a theorem or solving a geometric problem. These problems prompt AI to generate answers and learn how to improve itself through reinforcement learning, which is known as post-training. Post-training requires a significant amount of computational power, but the end result produces incredible models.

The third Scaling law relates to what is known as test-time adaptation. Test-time adaptation refers to when you use AI, it can apply different resource allocations rather than simply improving its parameters. Now it focuses on deciding how much computational power to use to generate the desired answer Reasoning is a way of thinking, while prolonged thinking is another way of thinking, rather than direct reasoning or one-time answers. You might reason about it, possibly break the problem down into multiple steps, generate multiple ideas and evaluate which of your artificial intelligence system's assessments of the ideas you generated is the best, perhaps it solves the problem step by step, and so on.

Therefore, testing time extension has proven to be very effective. You are witnessing the development of this series of technologies, as well as the emergence of all these Scaling laws, because we see the incredible achievements made from ChatGPT to o1, then to o3, and now to Gemini Pro, all of which have undergone a journey from pre-training to post-training to testing time extension.

Of course, the computational power we need is astonishing, and in fact, we hope society can scale computation to produce more and better intelligence. Intelligence is certainly the most valuable asset we have, which can be applied to solve many very challenging problems. Therefore, Scaling laws are driving huge demand for NVIDIA's computing, as well as the incredible demand for chips like Blackwell.

Blackwell's performance per watt has improved fourfold over the previous generation

Let's take a look at Blackwell. Blackwell is currently in full production, and it looks incredible.

First, every cloud service provider now has systems running. We have systems from about 15 computer manufacturers, producing around 200 different stock-keeping units (SKUs), 200 different configurations.

They include liquid cooling, air cooling, x86 architecture, as well as various types of systems with NVIDIA Grace CPU versions, NVLink 36 x 2, 72 x 1, etc., so that we can meet the needs of almost all data centers globally. These systems are currently being produced in 45 factories. This tells us how ubiquitous artificial intelligence is and how quickly the entire industry is investing in this new computing model.

The reason we are pushing so hard is that we need more computational power, which is very clear. The GB200 NVLink72, weighing 1.5 tons, contains 600,000 components. It has a backbone that connects all these GPUs together, with two miles of copper cables and 5,000 wires.

This system is produced in 45 factories around the world. We build them, liquid cool them, test them, disassemble them, and ship them in parts to data centers, because it weighs 1.5 tons, we reassemble it outside the data center and install it The manufacturing process is very crazy, but all of this is aimed at the fact that the Scaling law is driving the development of computing power to the level of Blackwell.

The performance per watt of Blackwell has improved four times compared to our previous generation products, and the performance per dollar has increased three times. This basically means that in one generation of products, we have reduced the cost of training these models by three times, or if you want to scale the models up by three times, the cost is roughly the same. But importantly, these tokens being generated are used by all of us, applied to ChatGPT or Gemini and our phones.

In the future, almost all of these applications will consume these AI tokens, which are generated by these systems. Every data center is limited by power.

Therefore, if the performance per watt of Blackwell is four times that of our previous generation, then the revenue that can be generated, that is, the business volume that can be generated in the data center, has increased fourfold. Thus, these AI factory systems are essentially factories today.

Now, all of this is aimed at creating a huge chip. The computing power we need is quite astonishing, and this is basically a huge chip. If we had to build it as a chip, it would obviously be the size of a wafer, but this does not include the impact of yield, which may require three to four times the size.

But we basically have 72 Blackwell GPUs or 144 chips here. The AI floating-point performance of one chip reaches 1.4 ExaFLOPS, while the world's largest supercomputer, the fastest supercomputer, has only recently reached over 1 ExaFLOPS. It has 14 TB of memory, and the memory bandwidth is 1.2 PB per second, equivalent to the entire internet traffic currently occurring. Global internet traffic is being processed through these chips.

We have a total of 130 trillion transistors, 2592 CPU cores, and a large amount of networking. So, I hope I can do this, but I feel like I won't. So these are Blackwell, these are our Connect X network chips, and these are NV Link. We are trying to pretend that NV Link is the backbone, but that is impossible.

These are all HBM (High Bandwidth Memory), 14TB of HBM memory, and this is what we are trying to do. This is the miracle of the Blackwell system. The Blackwell chip is here, it is the largest single chip in the world.

We need a lot of computing resources because we want to train increasingly larger models In the past, there was only one reasoning, but in the future, AI will engage in self-dialogue, it will think and process internally. Currently, when tokens are generated at a rate of 20 or 30 per second, this is already the limit of human reading. However, future models like GPT-o1, Gemini Pro, and the new GPT-o1 and o3 will engage in self-dialogue and reflection.

Therefore, it can be imagined that the token generation rate will be extremely high. To ensure excellent service quality, low customer costs, and to drive the continuous expansion of AI, we need to significantly increase the token generation rate while reducing costs. This is one of the fundamental purposes of creating NV link.

NVIDIA creates three tools to help the ecosystem build AI agents: Nvidia NIMS, Nvidia NeMo, and open-source blueprints

One of the significant transformations happening in the business world is the "AI agent."

AI agents are a perfect example of extending testing time. They are a type of AI, a model system where some are responsible for understanding and interacting with customers and users, while others are responsible for retrieving information from storage, such as semantic AI systems.

They may access the internet or open a PDF file, use tools like calculators, or even leverage generative AI to create charts, etc. Moreover, it is iterative; it will gradually break down the questions you pose and process them through different models.

To better respond to customers in the future, let AI respond. In the past, a question was posed, and then the answer would pour out. In the future, if you pose a question, a multitude of models will run in the background, thus the testing time extension and the computational load required for reasoning will surge, and we hope to receive higher quality answers.

To help the industry build AI agents, our market strategy is not to directly target enterprise customers, but to collaborate with software developers in the IT ecosystem to integrate our technology to achieve new capabilities, just as we did with the CUDA library. Just as past computing models had APIs for computer graphics, linear algebra, or fluid dynamics, in the future, AI libraries will be introduced on these CUDA-accelerated libraries.

The three tools we provide to help the ecosystem build AI agents are: Nvidia NIMS, which is essentially packaged AI microservices. It packages and optimizes all the complex CUDA software, such as CUDA DNN, Cutlass, Tensor RTLM, or Triton, along with the models themselves into a container that you can use freely.

Thus, we have models for vision, language understanding, speech, animation, and digital biology, and we are about to launch some new and exciting physical AI models. These AI models can run on every cloud platform, as NVIDIA GPUs are now available on every cloud platform and original equipment manufacturer (OEM) Therefore, you can integrate these models into your software package to create AI agents running on Cadence for ServiceNow or SAP, which can be deployed to customers and operate anywhere the customer wishes to run the software.

The next tool is a system we call Nvidia NeMo, essentially a digital employee onboarding and assessment system.

In the future, these AI agents will become a digital workforce working alongside your employees to complete various tasks. Thus, introducing these specialized agents into the company is akin to onboarding employees. We have different libraries to help these AI agents train on the specific language of the company, perhaps with vocabulary that is unique to the company, as business processes and ways of working vary.

Therefore, you need to provide them with examples to illustrate the standards of work outcomes, and they will attempt to generate results that meet those standards, while you provide feedback and evaluation, repeating this process.

At the same time, you will set some boundaries, clarifying what they are not allowed to do and what they cannot say. We will even grant them access to certain information. Thus, the entire digital employee pipeline is referred to as NeMo.

In the future, the IT departments of every company will transform into human resource management departments for AI agents. Today, they manage and maintain a range of software from the IT industry, while in the future, they will be responsible for maintaining, nurturing, guiding, and improving a complete set of digital agents for company use. Your IT department will gradually evolve into a human resource management department for AI agents.

Additionally, we also provide a plethora of blueprints for our ecosystem to utilize, all of which are completely open source, and you can freely modify these blueprints. We have blueprints for various types of agents.

Today, we also announced a very cool and clever initiative: the launch of the LLAMA-based model family, namely the NVIDIA LLAMA Nemotron language foundation model, with LLAMA 3.1 being a significant achievement. The number of downloads for LLAMA 3.1 from Meta has reached 650,000, and it has been derived and transformed into about 60,000 different models, which is almost the main reason why enterprises in every industry are starting to pay attention to artificial intelligence.

We realize that LLAMA models can be better fine-tuned to meet the needs of enterprises, so we leveraged our expertise and capabilities to fine-tune it, forming the LLAMA Nemotron open-source model suite. Some of these models are very small, with extremely fast response times, and are compact; we call them super LLAMA Nemutron super models, which are essentially mainstream models.

The ultra-large models can serve as teacher models for other models, acting as reward model evaluators and judges to assess the quality of answers from other models and provide feedback. They can be distilled in various ways, serving both as teacher models and knowledge distillation models, powerful and widely usable, and these models are now available online They rank among the top in chat, commands, and retrieval leaderboards, possessing various functions required for AI agents.

We are also collaborating with the ecosystem, and all of NVIDIA's AI technologies have been deeply integrated with the IT industry. We have excellent partners, including ServiceNow, SAP, Siemens, etc., who are making outstanding contributions to industrial AI. Cadence and Synopsys are also doing excellent work. I am proud of our collaboration with Perplexity, which has completely transformed the search experience and achieved remarkable results.

Codium will become the next huge AI application for every software engineer in the world, with software coding being the next major service. There are 30 million software engineers globally, and each will have a software assistant to help them code; otherwise, their work efficiency will be greatly reduced, and the quality of the code produced will decline.

Therefore, involving such a large number of 30 million, while the total number of knowledge workers worldwide reaches 1 billion. Clearly, AI agents are likely to be the next robotics industry, with the potential to become a multi-trillion dollar business opportunity in the future.

Next, I will showcase some blueprints we have created in collaboration with our partners and our work results. These AI agents are the new digital workforce, working for us and collaborating with us. AI is a model system capable of reasoning around specific tasks, breaking down tasks, and retrieving data or using tools to generate high-quality responses.

(Demo video)

Transforming AI into a comprehensive AI assistant

Alright, let's continue talking about AI.

AI was born in the cloud, and the cloud AI experience is wonderful; using AI on mobile devices is also a lot of fun. Soon, we will have continuous AI that is always by our side. Imagine when you wear Meta glasses, you can simply point to or look at something and casually ask for related information; isn't that super cool?

While the cloud AI experience is great, our ambition goes beyond that; we want AI to be ubiquitous. As mentioned earlier, NVIDIA AI can be easily deployed to any cloud and cleverly integrated into internal company systems, but what we truly desire is to have it securely installed on personal computers Everyone knows that Windows 95 once sparked a revolutionary wave in the computer industry, bringing a series of innovative multimedia services that forever changed the way applications are developed. However, the computing model of Windows 95 has many limitations for AI and is not quite perfect.

We eagerly anticipate that AI in personal computers in the future can become a powerful assistant for everyone. In addition to the existing 3D, audio, and video APIs, there will also be new generative APIs for creating stunning 3D content, dynamic language, pleasant sounds, and more. We need to meticulously craft a brand new system that fully utilizes the massive upfront investment in the cloud while making all these beautiful visions a reality.

It is impossible to create another AI programming method in the world, so if we can turn Windows PCs into world-class AI PCs, that would be fantastic. And the answer is Windows WSL 2.

Windows WSL 2 is essentially a system that cleverly nests two operating systems, tailored specifically for developers, allowing them to directly and smoothly access hardware.

It has been deeply optimized for cloud-native applications, with a focus on comprehensive optimization for CUDA, truly achieving plug-and-play functionality. As long as the computer's performance is up to par, whether it's visual models, language models, or speech models, or creative animations, lifelike digital human models, etc., all kinds of models can run perfectly on personal computers, and with one click after downloading, you can embark on a wonderful journey.

Our goal is to build the Windows WSL 2 Windows PC into a top-tier platform, and we will support and maintain it for the long term.

Next, let me show you a blueprint example that we have just developed:

(Demo video)

NVIDIA AI is about to be installed in hundreds of millions of Windows computers worldwide. We have closely collaborated with top PC OEM manufacturers globally to ensure that these computers are fully prepared for the AI era. AI PCs will soon enter thousands of households and become helpful assistants in life

NVIDIA Cosmos, the world's first foundational model designed to understand the physical world,

Next, let's focus on the cutting-edge field of physical AI.

Speaking of Linux, let's also talk about physical AI. Imagine a large language model receiving context and prompt information on the left, then generating tokens one by one to ultimately output results. The model in between is extremely large, with billions of parameters, and the context length is quite considerable, as users may load several PDF files at once, which are cleverly transformed into tokens.

The attention mechanism of the Transformer allows each token to establish connections with other tokens; if there are hundreds of thousands of tokens, the computational load will grow quadratically.

The model processes all parameters and input sequences, generating a token through each layer of the Transformer, which is why we need computing power like Blackwell's to generate the next token. This is why the Transformer model is so efficient yet resource-intensive.

If we replace the PDF with the surrounding environment and the questions with requests, such as "go over there and bring that box," the output would no longer be tokens but action instructions, which makes a lot of sense for future robotics technology, and the relevant technology is just around the corner. However, we need to create an effective world model, distinct from language models like GPT.

This world model must understand the rules of the real world, such as gravity, friction, and inertia, as well as comprehend geometric and spatial relationships and causal relationships. What happens when something falls to the ground, or if you poke it, it will topple over; it must understand object permanence. A ball rolling off the kitchen counter does not disappear into another quantum universe; it is still there.

Currently, most models struggle to understand this kind of intuitive knowledge, so we need to build a foundational world model.

Today, we are announcing a significant development — NVIDIA Cosmos, the world's first foundational model designed to understand the physical world. Seeing is believing, let's take a look.

(Show video)

!

NVIDIA Cosmos, the world's first global foundational model, is trained on 20 million hours of video data, focusing on dynamic physical objects such as natural themes, human walking, hand movements, object manipulation, and rapid camera movements, with the aim of teaching AI to understand the physical world rather than generating creative content. With physical AI, many downstream applications can be developed.

We can use it for synthetic data generation to train models, refine models, initially create robotic models, and generate multiple future scenarios based on physical logic, just like Doctor Strange manipulating time, because this model understands the physical world.

As you can see, a bunch of generated images can also add subtitles to videos; it can shoot videos and add subtitles, which can be used to train multimodal large language models. Therefore, this foundational model can be used to train robots and large language models.

This platform has autoregressive models for real-time applications, diffusion models for generating high-quality images, super powerful tokenizers that learn the "vocabulary" of the real world, and data pipelines. If you want to use this data to train your own models, we have already done acceleration processing from start to finish due to the massive amount of data.

The data processing pipeline of the Cosmos platform leverages CUDA and AI acceleration.

Today, we announce the open-source license for Cosmos, which has been placed on GitHub, featuring small, medium, and large models corresponding to fast models, mainstream models, and teacher models, which are knowledge transfer models. We hope that Cosmos can bring a similar driving effect to the field of robotics and industrial AI as Llama 3 does for enterprise AI.

Physical AI will fundamentally change the $50 trillion manufacturing and logistics industries

When Cosmos is connected to Omniverse, magic happens.

The fundamental reason is that Omniverse is a system based on algorithmic physics, principle physics, and simulation construction; it is a simulator. Connecting it with Cosmos can provide benchmark facts for generating content in Cosmos, controlling and adjusting the generated results.

In this way, the content output by Cosmos is based on real situations, just like connecting large language models with retrieval-augmented generation systems to ensure that AI generates based on real benchmarks. The combination of the two becomes a physical simulation and a physics-based multiverse generator, with application scenarios that are extremely exciting and clear for robotics and industrial applications Cosmos plus Omniverse, along with computers for training AI, represent the three essential types of computers needed to build robotic systems.

Every robotics company ultimately needs three computers: one for training AI, the DGX computer; one for deploying AI, the AGX computer, which is deployed in various edge devices such as cars, robots, and autonomous mobile robots (AMR) to achieve autonomous operation.

Connecting the two requires a digital twin, which is the foundation of all simulations.

The digital twin is the place where trained AI performs operations such as practice, improvement, synthetic data generation, reinforcement learning, and AI feedback, thus it is the digital twin of AI.

These three computers will work interactively, and this three-computer system is NVIDIA's strategy for the industrial world, which we have discussed for some time. Rather than a "three-body problem," it is more accurately described as a "three-body computer solution," which is NVIDIA in the field of robotics.

Here are three examples.

The first example is industrial digitalization. Millions of factories and hundreds of thousands of warehouses worldwide form the backbone of the $50 trillion manufacturing industry, which will all need to be software-defined, automated, and integrated with robotics technology in the future.

We are collaborating with Kion, a global leader in warehouse automation solutions, and Accenture, the world's largest professional services provider, focusing on digital manufacturing to create special solutions together. Let's take a look.

Our marketing strategy, like other software and technology platforms, leverages developers and ecosystem partners. More and more ecosystem partners are connecting to Omniverse because everyone wants to digitalize future industries; the $50 trillion in global GDP contains too much waste and automation opportunities.

(Show video)

In the future, everything can be simulated. Every factory will have a digital twin, generating a multitude of future scenarios with Omniverse and Cosmos, while AI selects the optimal scenario, becoming the programming constraints for deployment in real factories

Next Generation Automotive Processor — Thor

The second example is autonomous vehicles.

After years of development, Waymo and Tesla have achieved success, and the autonomous driving revolution has arrived.

We provide three types of computers for this industry: systems for training AI, simulation and synthetic data generation systems Omniverse and Cosmos, and in-vehicle computers. Each automotive company may collaborate with us in different ways, potentially using one, two, or all three types of computers.

Almost every major automotive company in the world collaborates with us in various ways, utilizing one, two, or all three of these types of computers, such as Waymo, Zoox, Tesla, and BYD — the largest new energy vehicle company in the world. Jaguar Land Rover has super cool new cars, and Mercedes-Benz will start mass production of a batch of vehicles equipped with NVIDIA technology this year.

We are particularly pleased to announce today that Toyota and NVIDIA have reached a partnership to create the next generation of autonomous vehicles. Many other companies, including Lucid, Rivian, Xiaomi, and Volvo, are also involved.

TuSimple is building self-aware trucks, and this week it was announced that Aurora will use NVIDIA technology to create autonomous trucks.

Globally, 100 million vehicles are produced each year, with billions of vehicles on the road, traveling trillions of miles annually, and in the future, they will be highly automated or fully autonomous, which will be a massive industry. Just looking at the few vehicles already on the road, our business revenue has already reached $4 billion, and it is expected to reach $5 billion this year, with enormous potential.

Today, we are launching the next generation automotive processor — Thor.

This is Thor, a robotic computer that processes vast amounts of sensor information, with countless cameras, high-resolution radar, and lidar data flooding in. It needs to convert this into tokens and send them into the Transformer to predict the next driving path.

Thor is now fully in production, with processing power 20 times that of the previous generation Oren, which is the current standard for autonomous vehicles.

Thor is not only used in cars but can also be used in complete robots, such as AMRs (Autonomous Mobile Robots) or humanoid robots, serving as their brains and controllers, acting as a universal robotic computer I am also particularly proud to announce that our Safety Drive OS is now the first software-defined programmable AI computer to receive ASIL D certification, the highest standard for automotive functional safety. This remarkable achievement ensures functional safety for CUDA. If we use NVIDIA CUDA for building robots, that would be perfect.

Next, I will show you how to use Omniverse and Cosmos to work in autonomous driving scenarios. Today, we will not only show you videos of cars running on the road but also demonstrate how to automatically reconstruct digital twins of vehicles using AI, leveraging this capability to train future AI models.

(Show video)

Isn't it incredible?

Thousands of drives can turn into billions of miles of data. While actual vehicles are still needed on the road to continuously collect data, using this physics-based, reality-aligned metaverse capability to generate synthetic data provides vast amounts of accurate and reasonable data for training autonomous driving AI.

The autonomous driving industry is gaining momentum, and in the coming years, just like the rapid transformation of computer graphics technology, the development speed of autonomous driving will also significantly increase, which is incredibly exciting.

The "ChatGPT Moment" for General Robotics is Within Reach

Let's talk about humanoid robots.

The "ChatGPT moment" in the field of general robotics is just around the corner. The empowering technologies I have discussed will lead to rapid and astonishing breakthroughs in the field of general robotics in the coming years.

The importance of general robotics lies in the fact that robots with tracks or wheels require special environmental adaptations, while three types of robots do not need special venues and can perfectly integrate into our existing world, making them ideal choices.

The first type is embodied intelligent robots. With embodied intelligence, as long as the computing power of office computers is sufficient, these information worker robots can showcase their capabilities.

The second type is autonomous vehicles, after all, we have spent over a hundred years building roads and cities.

The third type is humanoid robots. If we can conquer the related technologies of these three types of robots, it will become the largest technology industry in history, so the era of robots is about to arrive The key lies in how to train these robots. For humanoid robots, collecting imitation information is challenging. While we continuously generate driving data when driving, humanoid robots find it laborious and time-consuming to collect human demonstration actions.

Therefore, we need to come up with a clever way to utilize artificial intelligence and Omniverse to synthesize hundreds of thousands of human demonstration actions into millions of simulated actions, allowing AI to learn how to perform tasks from them. Below, we will show you how to do this specifically.

Global developers are working on the next generation of physical AI, which includes embodied robots and humanoid robots. Developing a general-purpose robot model requires massive amounts of real-world data, and the costs of collecting and organizing this data are high. NVIDIA's Isaac Groot platform has emerged to provide developers with four powerful tools: robot foundational models, data pipelines, simulation frameworks, and the Thor robot computer.

The synthetic motion generation blueprint of NVIDIA's Isaac Groot is a set of imitation learning simulation workflows that enable developers to generate exponentially large datasets from a small number of human demonstrations.

First, with Gro Teleop, skilled workers can enter the robot's digital twin space using Apple Vision Pro.

This means that even without a physical robot, operators can collect data and control the robot in a risk-free environment, avoiding physical damage or wear. To teach a robot a task, operators can capture motion trajectories through a few remote control demonstrations and then use Gro Mimic to expand these trajectories into a larger dataset.

Next, using the Gro Gen tool based on Omniverse and Cosmos, domain randomization and 3D-to-real scene scaling are performed to generate datasets that grow exponentially in scale. The multiverse simulation engines of Omniverse and Cosmos provide vast datasets for training robot strategies. Once the strategies are trained, developers conduct software-in-the-loop testing and validation in Isaac Sim before deploying them to real robots.

Driven by NVIDIA's Isaac Groot, the era of general-purpose robots is about to arrive.

We will have vast amounts of data for robot training. The NVIDIA Isaac Groot platform provides key technological elements for the robotics industry, accelerating the development of general-purpose robots.

AI Supercomputers Move to the Desktop

There's another project I need to introduce. Without this incredible project launched ten years ago, none of this would be possible. Internally, it is called Project Digits—the deep learning GPU intelligent training system.

Before its launch, I streamlined the DGX to make it compatible with RTX AGX, OVC, and other company products. The birth of DGX 1 completely revolutionized the field of artificial intelligence.

In the past, building a supercomputer required constructing facilities and setting up infrastructure, which was a massive engineering task. The DGX 1 we created allows researchers and startups to have an AI supercomputer right out of the box In 2016, I delivered the first DGX 1 to a startup called OpenAI, where many engineers, including Elon Musk and Ilya Sutskever, were present to celebrate its arrival.

Clearly, it revolutionized the fields of artificial intelligence and computing. But today, artificial intelligence is everywhere, not just in research institutions and startup labs. As mentioned at the beginning, artificial intelligence has become a new way of computing and building software; every software engineer and creative artist, anyone who uses a computer as a tool, needs an AI supercomputer.

I have always hoped that the DGX 1 could be smaller; just imagine, ladies and gentlemen.

This is NVIDIA's latest AI supercomputer, currently called Project Digits. If you have a better name, feel free to let us know.

The amazing thing is that this is an AI supercomputer running the entire NVIDIA AI stack; all NVIDIA software can run on it, and DGX Cloud can also be deployed. It can be placed anywhere, wirelessly connected, and can also be used as a workstation, accessed remotely like a cloud supercomputer. NVIDIA AI can run on it.

It is based on a super mysterious chip, the GB110, our smallest Grace Blackwell chip. Let me show you what’s inside.

Isn't it super cute?

This chip has already gone into production. This highly confidential chip was developed in collaboration with Mediate, a global leader in system-on-chip (SoC) technology, connecting the CPU and NVIDIA's GPU through chip-to-chip Mv link. It is expected to be launched around May, which is very exciting.

It looks something like this; whether you use a PC or Mac, it doesn't matter. It is a cloud platform that can be placed on a desk and can also be used as a Linux workstation. If you want more units, you can connect them using Connect.X, bringing multiple GPUs, ready to use, with a complete supercomputing stack. This is NVIDIA Project Digits.

As I just mentioned, we have three new Blackwell products in production, not only the Grace Blackwell supercomputer and the nvlink 72 system in global mass production, but also three brand new Blackwell systems. **

An amazing AI foundational world model, the world's first open-source physical AI foundational model has been released, activating industries such as global robotics; there are also three types of robots, including humanoid robots based on embodied intelligence and autonomous vehicles, all making strides. This year has been fruitful. Thank you all for your cooperation, and thank you for being here. I made a short video to review last year and look forward to the coming year, let's play it.

Wishing everyone a fruitful CES, Happy New Year, thank you!