Jensen Huang: This is the first time, the key technology of general-purpose humanoid robots is "within reach."
In a recent interview, Jensen Huang stated that the technology for general humanoid robots is "difficult to achieve," but with breakthroughs in large language models and foundational models, the necessary technology is "within reach." In the future, the application science of AI will become particularly important. The challenge for the new generation is how to use AI technology to solve practical problems and create value
Recently, NVIDIA CEO Jensen Huang was interviewed by the tech self-media Tiff In Tech. He shared NVIDIA's latest breakthroughs in the field of physical AI.
Jensen Huang stated that the technology for general humanoid robots is "difficult to achieve," but with breakthroughs in Transformer models, large language models, and foundational models, the necessary technology is "within reach," and NVIDIA can make a real contribution in this area.
Physical AI refers to AI systems that can understand and interact with the physical world, and this technology will fundamentally change the way robots are trained. NVIDIA's Cosmos and Omniverse platforms can simulate the real world in a virtual environment, significantly shortening the time required for machine learning.
Jensen Huang mentioned that NVIDIA's technology can reduce the training time for robots from years to just a few hours, greatly improving efficiency and feasibility.
Jensen Huang also emphasized the safety of NVIDIA's "three computer" autonomous driving solution. This system, which involves three independent computers working together, can provide higher reliability and redundancy, thereby enhancing the safety performance of autonomous vehicles.
Jensen Huang stated that in the next decade, the application science of artificial intelligence will become particularly important. Unlike the previous generation, which mainly focused on how to apply computers to solve chip design and software engineering problems, the new generation faces the challenge of how to apply AI technology across various industries and fields to solve real problems and create value.
The following are highlights from the conversation:
- The technology for general humanoid robots will be the most practical because our world is built around human needs. This technology is extremely difficult to achieve. However, with breakthroughs in Transformer models, large language models, and foundational models, we believe we have the technological foundation to make substantial contributions in this field.
- Teaching general robots is similar to teaching a person. Using Isaac Sim, we can conduct a few human demonstrations and then use AI, Cosmos, and Omniverse to generate a large number of different scenarios.
- The virtual world created by Omniverse is essentially a digital twin of the real world, which is exactly what it was designed for.
- Omniverse is like a virtual playground. For robots, it feels just like the real world because this virtual environment follows the laws of physics, and everything looks very real. Robots cannot distinguish between the virtual world and the real world, which is the key point.
- The way we serve the autonomous vehicle industry is through three computer systems: one for training AI, one for simulating AI, called the panoramic universe, and one for integrating AI into vehicles.
- Our generation mainly focuses on how to apply computers to solve chip design and software engineering problems, while the new generation needs to think about how to apply AI to solve all these fundamental issues. For example, how to apply AI in forestry? How to apply AI to oceanography? Every industry and every scientific field will be affected in this way.
The following is the full interview:
Building General-Purpose Robots: The Fusion of World Models and Artificial Intelligence
Tiff In Tech:
Hi, Jensen. Thank you for taking the time to chat with me today. You made some groundbreaking announcements at CES, particularly in an area I'm very curious about, which is robotics. When it comes to robotics, what excites you the most about using tools like Cosmos or world foundational models?
Jensen Huang:
We are in an incredible era of robotics. The key technologies needed to build general-purpose humanoid robots are on the horizon. One of the key technologies is an artificial intelligence model that understands the world, just like we have AI models that understand language. With the advent of ChatGPT and Llama, we now need a world model, a language model that understands the world. The world needs robots. One reason for this is that we do not have enough labor. You know, the population is aging, people's preferences for types of work are changing, birth rates are declining, and the world needs more labor. Therefore, the timing for having robotic systems is indeed relatively urgent.
General-purpose humanoid robot technology will be most practical because our world is built around human needs. This technology is extremely difficult to achieve. However, with breakthroughs in Transformer models, large language models, and foundational models, we believe we have the technological foundation to make substantial contributions in this field.
We need to combine several aspects: first, robots must understand us. For example, the breakthroughs with ChatGPT have indeed made this possible. But what is missing is that we now need an AI that understands the physical world. It must understand the dynamics of the physical world, including gravity, inertia, and friction, and it must also understand spatial relationships and geometric relationships, as well as some common-sense things like object permanence.
Therefore, we are working on creating essentially a world model version of ChatGPT or Llama. It is called a world foundational model (an AI model for understanding and simulating the physical world). Just like a language foundational model (an AI model for understanding and generating human language), this is a foundational model for understanding the world. If we can create something like that, it is Cosmos, and we will make it publicly available for everyone to use, hoping that it will truly ignite and accelerate the development of robotics.
Isaac Sim: Virtual Reality Empowering Efficient Robot Training
Tiff In Tech:
These technologies are very promising in teaching robotics. I understand there have been some recent announcements regarding Isaac Sim, particularly in the area of virtual reality training. How do you see the future development and potential of this technology?
Jensen Huang:
The first step in training AI is to give them foundational knowledge, which is common-sense knowledge. The second step is to cultivate the necessary skills. Teaching general-purpose robots is similar to teaching a person. We teach through demonstration. We use human demonstrations to show robots how to pick up a glass. Each time, the position, height, and shape of the glass may vary slightly, but essentially it is still the action of picking up a glass of water. By utilizing Isaac Sim, we can conduct several human demonstrations and then use AI, Cosmos, and Omniverse to generate a large number of different scenarios. ** We generate various versions of different sizes, positions, and placements, and provide this training data, including imitation data, for the robots to learn. This way, the robots can learn a large number of generalized versions of the action.
Tiff In Tech:
It seems there could be an infinite number of versions. This is precisely the problem this technology solves by providing these diverse training versions to the robots.
Jensen Huang:
That's right. We don't just give the robots one example; we provide millions of different examples.
Omniverse: Creating Infinite Possibilities for Robot Training
Tiff In Tech:
You also mentioned Omniverse, which interests me greatly, especially regarding virtual reality training in industries like manufacturing. How do you see the future development of these industries using Omniverse for training?
Jensen Huang:
The robotics industry is developing slowly, mainly because training robots is very difficult. You need to create a large number of experiential scenarios for the robots. Moreover, training robots in the real world also poses safety risks. Therefore, we created a virtual world, essentially a playground for robots.
This Omniverse is a virtual playground. For robots, it feels just like the real world because this virtual environment follows the laws of physics, and everything looks very realistic. Robots cannot distinguish between the virtual world and the real world, which is the key point. We train robots in this virtual world of Omniverse, creating a large number of learning scenarios for them. Once the robots learn how to complete tasks in Omniverse, we transfer this robotic brain to the real robot. If the gap from virtual to reality (SIM-to-real gap, the difference between the simulation environment and the real environment) is small enough, the robots cannot perceive the difference. That’s what’s amazing. The virtual world created by Omniverse is essentially a digital twin of the real world, which is exactly what it was designed for.
Tiff In Tech:
That's incredible. If trained using traditional methods, it would certainly consume a lot of resources and time.
Jensen Huang:
Yes, indeed. Otherwise, if we were to train a robot to learn to walk in the real world, it would learn linearly along the human timeline. But in the Omniverse, we can create multiple different multiverses, allowing robots to learn in parallel, possibly learning in 100,000 different ways at the same time. This way, we can shorten the time required to train a robot to complete a task that would originally take 10 years to just a few hours Imagine if we had a multiverse, how smart we could become. Just like different versions of Tiffany can learn math here, science there, English somewhere else, and geography, etc., while learning all these subjects simultaneously. This is essentially what the panoramic universe can achieve.
NVIDIA Driving AI: Multi-layered Safety Assurance for the Future of Autonomous Driving
Tiff In Tech:
This technology is truly desirable. Yesterday, you announced progress in another area, which is about NVIDIA Driving AI, significantly enhancing and improving the safety of autonomous vehicles. I know you also announced a partnership with Toyota, which is very exciting.
Jensen Huang:
Yes, that was a major news, really important. Toyota is the largest car company in the world.
Tiff In Tech:
That's right, it is indeed very exciting. What do you think about the prospects of NVIDIA Driving AI?
Jensen Huang:
We have been deeply involved in the autonomous driving field for many years, and it has now developed into a business of about $5 billion. We serve the autonomous vehicle industry through three computing systems: one for training AI, one for simulating AI, called the panoramic universe, and one for integrating AI into vehicles.
For in-vehicle AI, safety is everything. To address safety issues, the algorithms must be safe first. They must be able to intelligently avoid dangers and know how to drive safely, etc.
But these are all algorithm-level issues; at a deeper level, the operating system must be designed to be safe. The in-vehicle computer must be designed to be safe, meaning it cannot fail, and even if it does fail, it must fail safely. This involves a series of very complex technologies, including diversity of algorithms and redundancy of computing. All these complex technologies make safety possible.
Tiff In Tech:
This perspective is very interesting. Because from a consumer's point of view, people usually think of safety as more about detecting objects. But as you said, it involves many layers, extending all the way to the algorithm level, which is the key.
Jensen Huang:
That's right. The more diversity and redundancy you have, the safer the system is.
Tiff In Tech:
You have led NVIDIA to achieve many breakthroughs in gaming and artificial intelligence. In the next decade, which emerging technology do you think will have the greatest impact on us?
Jensen Huang:
Without a doubt, artificial intelligence is the most important technology of our time. If you take a step back and ask yourself, what would happen if we could expand intelligence and apply it to channel capabilities, healthcare interactions, drug development, addressing climate change, or developing robots? **We are researching these technologies to address aging populations, declining populations, and to prevent and mitigate inflation around the world by increasing productivity in every industry. Artificial intelligence will impact so many areas, which is why our company is fully committed to it **
Now, artificial intelligence is influencing all of our other businesses, including electronics. Although GeForce is an important force driving the development of artificial intelligence, AI is now in turn making GeForce even more outstanding in computer graphics. The effects we can achieve by combining artificial intelligence and computer graphics are incredible. We are integrating artificial intelligence with physical sciences, revolutionizing the way scientific computing is done. We are also applying it to chip design and software development to create better chips and develop higher-quality software. Therefore, artificial intelligence influences everything we do, and it will also impact every aspect of every industry. There is no doubt that this is the most important technology today.
Future Technology AI Trends and Career Development
Tiff In Tech:
This brings to mind a question. I have many fans and followers on my channel who are either studying computer science or working in the tech field. A common question is that there are so many directions to choose from in the tech field. From a business and technical perspective, artificial intelligence does seem to be a field worth their continued exploration.
Jensen Huang:
Yes, contributing to the foundational science of artificial intelligence is certainly great. However, the application science of artificial intelligence will become particularly important in the next decade.
I use ChatGPT as a work partner every day. I keep ChatGPT open, asking it questions and collaborating to solve problems. You have to learn how to interact with AI. As you know, prompt engineering is indeed an art, combining both artistry and science. Therefore, you need to learn how to interact with people, and you also need to learn how to interact with AI. We need to think about how to apply AI in various fields such as content creation, engineering, software development, marketing, finance, or law.
How to apply AI in these fields is a direction worth a lot of research and development. Our generation mainly focuses on how to use computers to solve chip design and software engineering problems, while the new generation needs to think about how to apply AI to solve all these fundamental issues. For example, how can AI be applied in forestry? How can AI be applied in oceanography? Each industry, each scientific field will be affected.
Tiff In Tech:
Thank you very much for taking the time to talk with me today. This conversation has made me very excited about the future and the upcoming technological changes