Jensen Huang: AI data centers can scale to millions of chips, performance doubles annually, and energy consumption decreases by 2-3 times per year (full text attached)

Wallstreetcn
2024.11.08 22:01
portai
I'm PortAI, I can summarize articles.

Jensen Huang stated that there are no physical laws limiting AI data centers from scaling to millions of chips, and we can now scale AI software to run across multiple data centers. We are prepared to scale computing at an unprecedented level, and we are just getting started. In the next decade, computing performance will double or triple each year, while energy demand will decrease by 2-3 times annually, which I call the super Moore's Law curve

This week, NVIDIA CEO Jensen Huang was interviewed by the host of "No Priors," engaging in an in-depth dialogue on AI-related topics such as NVIDIA's ten-year bet, the rapid development of the x.AI supercluster, and innovations in NVLink technology.

Jensen Huang stated that no physical law can prevent the expansion of AI data centers to one million chips. Although this is a challenge, many large companies, including OpenAI, Anthropic, Google, Meta, and Microsoft, are competing for leadership in the AI field, striving to reach the pinnacle of technology. However, the potential rewards of recreating intelligence are so significant that it cannot be ignored.

Moore's Law was once the core principle of the semiconductor industry, predicting that the number of transistors on a chip would double every two years, leading to continuous performance improvements. However, as physical limits are approached, the pace of Moore's Law has begun to slow, and the bottlenecks in chip performance enhancement are gradually becoming apparent.

To address this issue, NVIDIA combines different types of processors (such as GPUs, TPUs, etc.) to break through the traditional limitations of Moore's Law through parallel processing. Jensen Huang stated, "In the next 10 years, computing performance will double or triple every year, while energy demand will decrease by 2-3 times each year. I call this the 'super-Moore's Law curve.'"

Jensen Huang also mentioned that we can now scale AI software across multiple data centers: "We are ready to scale computing to unprecedented levels, and we are at the starting point in this field."

Here are the highlights of Jensen Huang's speech:

  1. We have made significant investments for the next 10 years. We are investing in infrastructure to build the next generation of AI computing platforms. We have invested in software, architecture, GPUs, and all the components needed for AI development.

  2. Moore's Law, the prediction that the number of transistors will double every two years, was once a growth guide for the semiconductor industry. However, as physical limits are approached, Moore's Law can no longer drive chip performance improvements on its own. To address this issue, NVIDIA has adopted a "heterogeneous computing" approach, combining different types of processors (such as GPUs, TPUs, etc.) to break through the traditional limitations of Moore's Law through parallel processing. NVIDIA's technological innovations, such as the CUDA architecture and deep learning optimizations, enable AI applications to run at high speeds in environments that exceed Moore's Law.

  3. We launched NVLink as an interconnect technology that allows multiple GPUs to work together, with each GPU handling different parts of the workload. Through NVLink, the bandwidth and communication capabilities between GPUs are significantly enhanced, allowing data centers to scale and support AI workloads.

  4. Future AI applications require dynamic and resilient infrastructure that can adapt to various scales and types of AI tasks. Therefore, NVIDIA is committed to building infrastructure that can be flexibly configured and efficiently operated to meet the needs of AI projects ranging from small to medium-sized to ultra-large supercomputing clusters.

  5. The key to building AI data centers is to optimize both performance and efficiency simultaneously. In AI workloads, you need massive power, and heat dissipation becomes a significant issue So we spent a lot of time optimizing the design and operation of data centers, including cooling systems and power efficiency.

  6. In the context of rapid hardware development, maintaining compatibility between software and hardware architecture is particularly important. Jensen Huang mentioned that we must ensure our software platform, such as CUDA, can be used across generations of hardware. Developers should not be forced to rewrite code every time we launch a new chip. Therefore, we ensure backward compatibility and allow software to run efficiently on any new hardware we develop.

  7. We are building a supercluster called X.AI, which will become one of the largest AI supercomputing platforms in the world. This supercluster will provide the computing power needed to support some of the most ambitious AI projects. This is a significant step forward in driving AI.

  8. One major challenge in scaling AI data centers is managing the enormous energy they consume. The issue is not just about building larger and faster systems. We also have to deal with the heat and power challenges that come with running these massive systems. Innovative engineering techniques are needed to ensure the infrastructure can cope with all of this.

  9. The role of AI in chip design is becoming increasingly important. Jensen Huang pointed out that AI is already playing a significant role in chip design. We use machine learning to help design more efficient chips that are faster. This is a key part of designing the next generation of NVIDIA chips and helps us build chips optimized for AI workloads.

  10. The surge in NVIDIA's market value is due to our ability to transform the company into an AI company. We started as a GPU company, but we have transformed into an AI computing company, and this transformation is a key part of our market value growth. The demand for AI technology is rapidly increasing, and we are in a favorable position to meet this demand.

  11. Embodied AI refers to the integration of AI with the physical world. In this way, AI can not only perform tasks in virtual environments but also make decisions and execute tasks in the real world. Embodied AI will drive rapid advancements in technologies such as smart hardware and autonomous driving.

  12. AI is not just a tool; it can also become a 'virtual employee' that helps improve work efficiency. AI can replace or assist human work in areas such as data processing, programming, and decision-making, thereby changing the entire labor market and work methods.

  13. AI will have a huge impact on the fields of science and engineering, especially in drug development, climate research, and physical experiments. AI will help scientists process vast amounts of data, reveal new scientific laws, and accelerate innovation. It will also optimize design in engineering, improve efficiency, and drive the development of more innovative technologies.

  14. I also use AI tools in my daily work to enhance efficiency and creativity. I believe that AI can help us handle complex data and decision-making tasks, as well as enhance our creative thinking and work efficiency, becoming an indispensable part of everyone's work.

The following is the full transcript of the interview, translated by AI:

Host: Welcome back, Johnson, 30 years in to Nvidia and looking 10 years out, what are the big bets you think are still to make? Is it all about scale up from here? Are we running into limitations in terms of how we can squeeze more compute memory out of the architectures we have? What are you focused on? Well.

Hi, Johnson, welcome back! You have worked at NVIDIA for 30 years, and looking ahead to the next 10 years, what important investment opportunities do you think still exist? Is it just about scaling up? Are we facing limitations in the existing architectures that prevent us from extracting more compute memory? What are your current focus areas?

Jensen Huang: If we take a step back and think about what we've done, we went from coding to machine learning, from writing software tools to creating AIs and all of that running on CPUs that was designed for human coding to now running on GPUs designed for AI coding, basically machine learning. And so the world has changed the way we do computing the whole stack has changed. And as a result, the scale of the problems we could address has changed a lot because we could, if you could paralyze your software on one GPU, you've set the foundations to paralyze across a whole cluster or maybe across multiple clusters or multiple data centers And so I think we've set ourselves up to be able to scale computing at a level and develop software at a level that nobody's ever imagined before. And so we're at the beginning that over the next 10 years, our hope is that we could double or triple performance every year at scale, not at chip, at scale. And to be able to therefore drive the cost down by a factor of 2 or 3, drive the energy down by a factor of 2,3 every single year. When you do that every single year, when you double or triple every year in just a few years, it adds up. So it compounds really aggressively. And so I wouldn't be surprised if, you know, the way people think about Moore's Law, which is 2 x every couple of years, you know, we're gonna be on some kind of a hyper Moore's Law curve. And I fully hope that we continue to do that. Well, what.

In the past, we relied on writing code ourselves, but now we are starting to let machines learn and write code by themselves. The type of computer chips (CPU) we used to use was for humans to write code, but now the computer chips (GPU) we use are for machine learning. Because of these changes, the way we solve problems is completely different from before. For example, if you can run a machine learning program on a GPU, then you can run it across an entire cluster of computers, or even across many clusters or data centers. This means that we can now handle problems that are much larger than before. Therefore, we believe we have established a foundation that can scale computing power and software development at a level that nobody has ever imagined before

We hope to double or triple computing power every year over the next 10 years, not just the capability of a single chip, but the overall capability. If that happens, we can reduce computing costs by two or three times each year and also cut energy consumption by two or three times. If this kind of growth can be achieved every year, then after a few years, the growth will be astonishing. Therefore, I believe that future computing will surpass the traditional "Moore's Law" (which states that computing power doubles every two years) and may follow a faster growth curve. I also hope to continue moving in this direction.

Host: Do you think is the driver of making that happen even faster than Moore's Law? Because I know Moore's Law was sort of self-reflexive, right? It was something that he said and then people kind of implemented it to make it happen.

What do you think are the factors driving the speed of computing power growth to exceed Moore's Law? Because I know that Moore's Law itself is a kind of "self-fulfilling" principle, right? In other words, Moore's Law was proposed by Moore, and then everyone followed this principle, resulting in its realization.

Jensen Huang: Yep, too. Fundamental technical pillars. One of them was Denard scaling and the other one was Carver Mead's VLSI scaling. And both of those techniques were rigorous techniques, but those techniques have really run out of steam. And so now we need a new way of doing scaling. You know, obviously the new way of doing scaling involves all kinds of things associated with co-design. Unless you can modify or change the algorithm to reflect the architecture of the system or change and then change the system to reflect the architecture of the new software and go back and forth Unless you can control both sides of it, you have no hope. But if you can control both sides of it, you can do things like

move from FP64 to FP32 to BF16 to FPA to, you know, FP4 to who knows what, right? And so, and so I think that code design is a very big part of that. The second part of it, we call it full stack. The second part of it is data center scale. You know, unless you could treat the network as a compute fabric and push a lot of the work into the network, push a lot of the work into the fabric. And as a result, you're compressing, you know, doing compressing at very large scales. And so that's the reason why we bought Melanox and started fusing infinite and MV Link in such an aggressive way.

The two key technological pillars that have driven technological progress in the past are Denard Scaling and Carver Mead's VLSI scaling. However, these two methods are not very effective now, and we need new ways to become faster.

The new way is "co-design," which means that software and hardware must be considered and optimized simultaneously. Specifically, if you cannot modify or adjust the algorithm to match the architecture of the system, or if you cannot change the system architecture to meet the needs of new software, then there is no hope. But if you can control both software and hardware at the same time, you can do many new things, such as moving from high-precision FP64 to low-precision FP32, then to BF16, FPA, and even lower precision calculations like FP4.

This is why "co-design" is so important. Additionally, another important part is full stack design. This means that you not only need to consider hardware but also the scale at the data center level For example, we must treat the network as a computing platform, pushing a large number of computing tasks onto the network and utilizing the network and hardware for large-scale compression calculations.

Therefore, we acquired Mellanox and began to actively promote high-speed connection technologies such as InfiniBand and NVLink to support this new large-scale computing architecture.

And now look where MV Link is gonna go. You know, the compute fabric is going to scale out what appears to be one incredible processor called a GPU. Now we get hundreds of GPUs that are gonna be working together. And now look where MV Link is gonna go. You know, the compute fabric is going to scale out what appears to be one incredible processor called a GPU. Now we get hundreds of GPUs that are gonna be working together. You know, most of these computing challenges that we're dealing with now, one of the most exciting ones, of course, is inference time scaling, which has to do with essentially generating tokens at incredibly low latency because you're self-reflecting, as you just mentioned. I mean, you're gonna be doing tree search, you're gonna be doing chain of thought, you're gonna be doing probably some amount of simulation in your head. You're gonna be reflecting on your own answers. Well, you're gonna be prompting yourself and generating text to your in, Now let's see where NVLink (NVIDIA's high-speed interconnect technology) is headed; the future computing architecture will become extremely powerful. You can think of it as a super powerful processor, which is the GPU (Graphics Processing Unit). Currently, NVIDIA's goal is to integrate hundreds of GPUs to work together, forming a massive computing platform.

One of the very exciting challenges we face in computing is the reduction of inference time. Especially when generating text, extremely low latency is required. Because, as you just mentioned, our thinking is actually a process of self-reflection: you might be conducting a "tree search" in your mind, thinking through a chain of thought, It may even involve some kind of simulation, reviewing one's own answers. You would ask yourself questions and generate answers, "silently" thinking in your mind, and then hope to respond within a few seconds.

To achieve this, the computational delay must be very low, as you cannot wait too long for results.

At the same time, the task of the data center is to generate a large number of high-throughput "tokens" (symbols). You need to control costs, maintain high throughput, and ensure returns. Therefore, low latency and high throughput are two conflicting goals: low latency requires quick responses, while high throughput requires processing more data. There is a conflict between the two.

To achieve both simultaneously, new technologies must be created, and NVLink is one of the ways we address this issue. Through NVLink, NVIDIA hopes to provide low latency while ensuring high throughput, thus resolving this computational contradiction and enhancing overall performance.

Now we have virtual GPUs, which have very powerful computing capabilities because we need such strong computing power to handle context. That is to say, when we are processing certain tasks, we need a very large memory (especially working memory), along with extremely high bandwidth to generate tokens (i.e., text or data symbols).

Host: Building the models, actually also optimizing things pretty dramatically like David and my team pull data where over the last 18 months or so, the cost of 1 million tokens going into a GPT-4 equivalent model has basically dropped 240 times. Yeah, and so there's just massive optimization and compression happening on that side as well.

The process of building models actually includes a lot of optimization work. For example, David and his team have successfully reduced the cost of 1 million tokens (for GPT-4 equivalent models) by 240 times through their efforts over the past 18 months.

Jensen Huang: Well. Just in our layer, just on the layer that we work on. You know, one of the things that we care a lot about, of course, is the ecosystem of our stack and the productivity of our software You know, people forget that because you have Kuda Foundation and that's a solid foundation. Everything above it can change. If everything, if the foundation's changing underneath you, it's hard to build a building on top. It's hard to create anything and interesting on top. And so could have made it possible for us to iterate so quickly just in the last year. And then we just went back and benchmarked when Lama first came out, we've improved the performance of Hopper by a factor of five without the algorithm, without the layer on top ever changing. Now, well, a factor of five in one year is impossible using traditional computing approaches. But it's already computing and using this way of code design, we're able to explain all kinds of new things.

In our field of work, there is something very important, which is the ecosystem of the technology stack and the productivity of software. What we particularly value is the Kuda Foundation as a foundational platform, which is very stable and solid. Because if the foundational platform keeps changing, it becomes very difficult to build a system or application on top of it, and it is fundamentally impossible to create anything interesting on an unstable foundation. Therefore, the stability of the Kuda Foundation allows us to iterate and innovate very quickly, especially in the past year.

Then, we also conducted a benchmarking test: when Lama was first launched, we improved the performance of Hopper (a computing platform or architecture) by a factor of five without changing the algorithm and without changing the upper architecture. Moreover, this fivefold improvement is almost impossible to achieve using traditional computing methods But through this new method of collaborative design, we are able to continuously innovate and explore more new technological possibilities based on the existing foundation.

Host: How much are, you know, your biggest customers thinking about the interchangeability of their infrastructure between large scale training and inference?

How concerned are your biggest customers about the interchangeability of their infrastructure between large scale training and inference?

Jensen Huang: Well, you know, infrastructure is disaggregated these days. Sam was just telling me that he had decommissioned Volta just recently. They have Pascals, they have Amperes, all different configurations of Blackwall coming. Some of it is optimized for air cooling, some of it's optimized for liquid cooling. Your services are gonna have to take advantage of all of this. The advantage that NVIDIA has, of course, is that the infrastructure that you built today for training will just be wonderful for inference tomorrow. And most of Chat GPT, I believe, are inferred on the same type of systems that we're trained on just recently. And so you can train on, you can infer on it. And so you're leaving a trail of infrastructure that you know is going to be incredibly good at inference, and you have complete confidence that you can then take that return on it, The current infrastructure is no longer static like it used to be. For example, Sam just told me that they recently phased out the Volta model equipment. They have Pascal models, Ampere models, and many different configurations of the Blackwall model coming soon. Some devices are optimized for air cooling, while others are optimized for liquid cooling. Your services need to be able to utilize all these different devices.

NVIDIA's advantage is that the infrastructure you build today for training will be very suitable for inference in the future. I believe that most Chat GPTs (likely referring to large language models) are performing inference on the same type of systems that were recently trained. So you can train on this system and also perform inference on this system. This way, you leave a track of infrastructure, knowing that this infrastructure will be very suitable for inference in the future. You are fully confident that you can reinvest the returns from previous investments into new infrastructure to scale up. You know you will leave behind something useful, and you know that NVIDIA and the entire ecosystem are working to improve the algorithms, so that your other infrastructure can improve efficiency by a factor of five in just a year. So this trend will not change.

And so the way that people will think about the infrastructures, yeah, even though I built it for training today, it's gotta be great for training. We know it's gonna be great for inference. Inference is gonna be multi-scale. Speaker 2 08:53 I mean, you're gonna take, first of all, in order to, the still smaller models could have a larger model that's still from and so you're still gonna create the People's perception of infrastructure is changing. Just like the facility I'm building now, although it's for training purposes, it also has to be very suitable for training. We know it will also be very suitable for inference in the future. Inference will come in many different scales.

I mean, you will have various models of different sizes. Small models can learn from large models, so you will still create some cutting-edge large models. These large models will be used for groundbreaking work, for generating synthetic data, for teaching small models, and then distilling knowledge to small models So there are many things you can do, but in the end, you will have very large models and very small models. These small models will be very effective; although they may not be universal, they will perform exceptionally well on specific tasks. They will excel in certain specific tasks, and we will see that in a small domain, small models can accomplish tasks beyond human capability. Maybe it's not a small language model, but you know, it's like a micro language model, TLMs, or something similar. So I think we will see models of various sizes, just like the software we have now.

Yeah, I think in a lot of ways, artificial intelligence allows us to break new ground in how easy it is to create new applications. But everything about computing has largely remained the same. For example, the cost of maintaining software is extremely expensive. And once you build it, you would like it to run on as large an install base as possible. You would like not to write the same software twice. I mean, you know, a lot of people still feel the same way. You like to take your engineering and move it forward. And so to the extent that, to the extent that the architecture allows you, on one hand, create software today that runs even better tomorrow with new hardware that's great or software that you create tomorrow, AI that you create tomorrow runs on a large install base. You think that's great. That way of thinking about software is not gonna

I think in many ways, artificial intelligence allows us to create new applications more easily. However, in terms of computing, most things remain the same. For example, the cost of maintaining software is very high. Once you build the software, you want it to run on as many devices as possible. You don't want to rewrite the same software. What I mean is, many people still think this way. You like to push your engineering forward. So, if the architecture allows you, on one hand, it would be great if the software created today can run better on new hardware tomorrow; or if the software you create tomorrow can run on many devices the day after tomorrow. You think that's fantastic. This way of considering software will not change.

Host: Change. And video has moved into larger and larger, let's say, like a unit of support for customers. I think about it going from single chip to, you know, server to rack and real 72. How do you think about that progression? Like what's next? Like should Nvidia do you full data center? But

With the development of technology, NVIDIA's products have expanded beyond individual chips to support entire data centers. What do you think about this development? What's next? For example, should NVIDIA build entire data centers?

Jensen Huang: In fact, we build full data centers the way that we build everything. Unless you're building, if you're developing software, you need the computer in its full manifestation. We don't build PowerPoint slides and ship the chips; we build a whole data center. And until we get the whole data center built up, how do you know the software works? Until you get the whole data center built up, how do you know your fabric works and all the things that you expected the efficiencies to be, In fact, we build complete data centers just like we build everything else. If you are developing software, you need the complete form of a computer to test it. We don't just make PowerPoint slides and then ship chips; we build the entire data center. Only when we have set up the entire data center can you know if the software is working properly, if your network wiring is effective, and if all the efficiencies you expect can be achieved. Only then do you know if it can really work at scale This is why people's actual performance is often far below the peak performance displayed on PPT slides; computing is no longer what it used to be. I say that the current computing units are data centers, and that's how it is for us. This is what you have to deliver, and this is what we are building.

We are now building the entire system this way. Then we build for every possible combination: air cooling, x86 architecture, liquid cooling, Grace chips, Ethernet, infinite bandwidth, MVLink, no NVLink, you know what I mean? We build every configuration. Our company currently has five supercomputers, and next year we can easily build five more. So, if you are serious about software, you will build the computers yourself; if you are serious about software, you will build the entire computer. We are all building at scale.

This is the part that is really interesting. We build it at scale and we build it very vertically integrated. We optimize it full stack, and then we disagree everything and we sell lemon parts. That's the part that is completely, utterly remarkable about what we do. The complexity of that is just insane. And the reason for that is we want to be able to graft our infrastructure into GCP, AWS, Azure, OCI. All of their control planes, security planes are all different and all of the way they think about their cluster sizing is all different. And yet we make it possible for them to all accommodate Nvidia's architecture. So that could, it could be everywhere. That's really in the end the singular thought, you know, that we would like to have a computing platform that developers could use that's largely consistent, This part is really interesting. We not only build at scale but also build with vertical integration. We optimize from the ground up and then separate the various parts to sell them individually. What we are doing is incredibly complex. Why do we do this? Because we want to integrate our infrastructure into different cloud service providers like GCP, AWS, Azure, and OCI. Our control platform and security platform are different, and we consider cluster sizes in various ways. However, we still find a way to make them all compatible with NVIDIA's architecture. This way, our architecture can be everywhere.

Ultimately, we hope to have a computing platform that developers can use to build software, which is consistent in most cases and can be modularly adjusted, with perhaps 10% differences here and there because everyone's infrastructure is slightly optimized differently, but no matter where it is, what we build can run. This is a principle of software that we hold very dear. It enables our software engineers to create software that runs everywhere. This is because we recognize that the investment in software is the most expensive investment, and it's easy to test.

Look at the size of the whole hardware industry and then look at the size of the world's industries. It's $100 trillion on top of this one trillion dollar industry And that tells you something. The software that you build, you have to, you know, you basically maintain for as long as you shall live. We've never given up on a piece of software. The reason why Kuda is used is because, you know, I called everybody. We will maintain this for as long as we shall live. And we're serious now. We still maintain. I just saw a review the other day, Nvidia Shield, our Android TV. It's the best Android TV in the world. We shifted seven years ago. It is still the number one Android TV that people, you know, anybody who enjoys TV. And we just updated the software just this last week and people wrote a new story about it. G Force, we have 300 million gamers around the world. We've never stranded a single one of them. And so the fact that our architecture is compatible across all of these different areas makes it possible for us to do it. Otherwise, we would be sub, we would be, we would have, you know, we would have software teams that are a hundred times the size of our company is today if not for this architectural compatibility So we're very serious about that, and that translates to benefits for the developers.

Look at the scale of the entire hardware industry, and then compare it to the scale of all industries in the world. The hardware industry is only one trillion dollars, while the total of all industries in the world is one hundred trillion dollars. This comparison tells you that the software industry is much larger than the hardware industry.

The software you create basically needs to be maintained continuously. We have never given up on any software. The reason Kuda is used by everyone is that I promised everyone that we would maintain it as long as we are around. We are still very serious about it; we are still maintaining it. A few days ago, I saw a comment saying that our NVIDIA Shield, our Android TV, is the best Android TV in the world. We launched it seven years ago, and it is still the number one ranked Android TV; anyone who loves watching TV loves it. We updated the software just last week, and then people wrote new articles to review it. Our G Force has 300 million players worldwide. We have never abandoned any of them. Our architecture is compatible across all these different fields, which allows us to do this. If it weren't for our architecture compatibility, our software team today would be a hundred times larger than it is now. So we take this very seriously, and it also benefits the developers.

Host: One impressive substantiation of that recently was how quickly you brought up a cluster for X dot AI. Yeah, and if you want to check about that, because that was striking in terms of both the scale and the speed with which you did it.

Recently, there was an impressive example of how we quickly built a cluster for X dot AI. If you want to know about this, because it was surprising in terms of both scale and speed. We completed this task very quickly.

Jensen Huang: You know, a lot of that credit you gotta give to Elon. I think, first of all, to decide to do something, select the site. I bring cooling to it. I power him and then decide to build this hundred thousand GPU super cluster, which is, you know, the largest of its kind in one unit Here we have to give a lot of credit to Elon Musk. First, he decided to do this, chose the location, solved the cooling and power supply issues, and then decided to build this supercomputer cluster with 100,000 GPUs, which is the largest of its kind to date. Then, we started working backwards, meaning we planned together the date that he was going to get everything up and running a few months ago. So, all the components, all the original equipment manufacturers, all the systems, all the software integration, we did with their team, and we simulated all the network configurations. We prepped everything like a digital twin; we prepped all of the supply chain All the network wiring. We even built a small version, like the first instance, you know, it's the baseline before everything is in place, the reference zero system. So, when everything is in place, everything is already arranged, all the practices are done, and all the simulations are completed.

And then, you know, the massive integration, even then the massive integration was a Monument of, you know, gargantuan teams of humanity crawling over each other, wiring everything up 247. And within a few weeks, the clusters were out. I mean, it's, it's really, yeah, it's really a testament to his willpower and how he's able to think through mechanical things, electrical things and overcome what is apparently, you know, extraordinary obstacles. I mean, what was done there is the first time that a computer of that large scale has ever been done at that speed. Unless our two teams are working from a networking team to compute team to software team to training team to, you know, and the infrastructure team, the people that the electrical engineers today, you know, to the software engineers all working together. Yeah, it's really quite a fit to watch. Was.

And then, you know, the massive integration work, even this integration work itself is a huge project that requires a lot of team members to work diligently like ants, almost around the clock connecting wires and setting things up. Within a few weeks, these computer clusters were built This is truly a testament to his willpower and demonstrates how he thinks in terms of mechanics and electronics, overcoming what is clearly a very significant obstacle. I mean, this is the first time such a large-scale computer system has been built in such a short period of time. It required our networking team, computing team, software team, training team, and infrastructure team, that is, all the electrical engineers, software engineers, everyone working together. It really is spectacular. It's like a large team collaboration, with everyone working hard to ensure everything runs smoothly.

Host: There a challenge that felt most likely to be blocking from an engineering perspective, active, just.

From an engineering perspective, was there any challenge that seemed most likely to become a stumbling block, that is, was there any technical problem that could potentially stall the entire project?

Jensen Huang: A tonnage of electronics that had to come together. I mean, it probably worth just to measure it. I mean, it's a, you know, it tons and tons of equipment. It's just abnormal. You know, usually a supercomputer system like that, you plan it for a couple of years from the moment that the first systems come on, come delivered to the time that you've probably submitted everything for some serious work. Don't be surprised if it's a year, you know, I mean, I think that happens all the time. It's not abnormal. Now we couldn't afford to do that. So we created, you know, a few years ago, there was an initiative in our company that's called Data Center as a product. We don't sell it as a product, but we have to treat it like it's a product We need to integrate a large number of electronic devices together. I mean, the quantity of these devices is substantial enough to warrant weighing them. There are tons and tons of equipment, which is quite abnormal. Typically, for supercomputer systems like this, from the delivery of the first system to having everything ready for some serious work, you usually need to plan for several years. If this process takes a year, you should know that this is quite common and not surprising.

But now we don't have the time to do that. So a few years ago, there was a plan in our company called "Data Center as a Product." We don't sell it as a product, but we must treat it like a product. From planning to building, to optimizing, tuning, and keeping it operational, everything is aimed at ensuring that it can work seamlessly, just like opening a brand new iPhone. That is our goal.

Now, of course, it's a miracle of technology making it that way, but we now have the skills to do that. And so if you're interested in a data center and just have to give me a space and some power, some cooling, you know, and we'll help you set it up within, call it, 30 days. I mean, it's pretty extraordinary.

Of course, being able to build a data center so quickly is truly a miracle of technology. But now we have the technical capability to do this. So if you want to build a data center, just give me a place, provide some power and cooling equipment, and we can help you set everything up in about 30 days. I mean, this is really quite remarkable

Host: That's wild. If you think, if you look ahead to 200,000,500,000, a million in a super cluster, whatever you call it. At that point, what do you think is the biggest blocker? Capital energy supply in one area?

That's really impressive. If you think about it, if in the future there is a super large computer cluster with 200,000, 500,000, or even a million computers, whatever you call it. At that time, what do you think the biggest challenge will be? Is it a funding issue, an energy supply issue, or something else?

Jensen Huang: Everything. Nothing about what you, just the scales that you talked about, though, nothing is normal.

Those things you mentioned, no matter which aspect, as long as it involves the huge scale you talked about, nothing is normal.

Host: But nothing is impossible. Nothing.

But, nothing is completely impossible. Anything is possible.

Jensen Huang: Is, yeah, no laws of physics limits, but everything is gonna be hard. And of course, you know, I, is it worth it? Like you can't believe, you know, to get to something that we would recognize as a computer that so easily and so able to do what we ask it to do, what, you know, otherwise general intelligence of some kind and even, you know, even if we could argue about is it really general intelligence, just getting close to it is going to be a miracle. We know that. And so I think the, there are five or six endeavors to try to get there Indeed, there are no laws of physics that say we can't do it, but everything will be very difficult. You know, is it worth it? You might find it hard to believe, but the kind of computer we aim to achieve, one that can easily do what we ask it to do, which is some form of general intelligence, even if we can argue whether it truly is general intelligence, getting close to it would be a miracle. We know it's hard. So I think there are five or six teams trying to reach this goal. Right? For example, OpenAI, Anthropic, X, as well as Google, Meta, and Microsoft, etc., they are all striving to climb this frontier of technology. Who doesn't want to be the first to reach the top? I believe the reward for reinventing intelligence is so great, its impact is too significant for us not to attempt it. So, while there are no restrictions in the laws of physics, everything will be difficult.

Host: A year ago when we spoke together, you talked about, we asked like what applications you got most excited about that NVIDIA would serve next in AI and otherwise, and you talked about how your most extreme customers sort of lead you there. Yeah, and about some of the scientific applications. So I think that's become like much more mainstream for you over the last year Is it still like science and AI's application of science that most excites you?

A year ago when we chatted, I asked you which applications of NVIDIA in AI and other fields you were most excited about, and you mentioned that some of your most extreme clients somewhat guided you. Yes, there was also a discussion about some scientific applications. So I feel that over the past year, these scientific and AI applications have become more mainstream. Now, is it still the application of science and AI in the field of science that excites you the most?

Jensen Huang: I love the fact that we have digital, we have AI chip designers here in video. Yeah, I love that. We have AI software engineers. How.

I'll just say it directly, we now have digital versions, that is, chip designers using artificial intelligence, right here in the video. Yes, I love this. We also have AI software engineers.

Host: Effective our AI chip designers today? Super.

How effective are our AI chip designers today? Very good.

Jensen Huang: Good. We can't, we couldn't build Hopper without it. And the reason for that is because they could explore a much larger space than we can and because they have infinite time. They're running on a supercomputer. We have so little time using human engineers that we don't explore as much of the space as we should, and we also can explore commentary. I can't explore my space while including your exploration and your exploration. And so, you know, our chips are so large, it's not like it's designed as one chip. It's designed almost like 1,000 ships and we have to ex, we have to optimize each one of them Our AI chip designers are really impressive. Without them, we wouldn't be able to create the Hopper chip at all. They can explore a range much broader than we humans can, and it seems like they have endless time. They run on supercomputers, while our human engineers have limited time and can't explore such a vast range. Moreover, we can't explore all possibilities simultaneously; when I explore my domain, I can't simultaneously explore yours.

Our chips are very large; it's not like designing a single chip, but more like designing 1,000 chips, each of which needs optimization. It's like a series of independent small islands. But we really want to optimize them together, collaborate on cross-module design, and optimize in a much larger space. Clearly, we can find better solutions, those best choices hidden in some corner. We can't do this without AI. Engineers simply don't have enough time to achieve it.

Host: One other thing has changed since we last spoke collectively, and I looked it up at the time in videos, market cap was about 500 billion. It's now over 3 trillion. So in the last 18 months, you've added two and a half trillion plus of market cap, which effectively is $100 billion plus a month or two and a half snowflakes or, you know, a stripe plus a little bit, or however you want to think about. A country or two Obviously, a lot of things have stayed consistent in terms of focus on what you're building, etc. And you know, walking through here earlier today, I felt the buzz like when I was at Google 15 years ago; you could feel the energy of the company and the vibe of excitement. What has changed during that period, if anything? Or how, what is different in terms of either how NVIDIA functions or how you think about the world or the size of bets you can take?

Since we last chatted, one thing has changed: I checked, and at that time, NVIDIA's market capitalization was about $500 billion. Now it has exceeded $3 trillion. So in the past 18 months, you have added more than $2.5 trillion in market value, which is equivalent to adding $100 billion each month, or the market value of two and a half Snowflake companies or just a bit more than one Stripe company, however you look at it.

This is equivalent to adding the market value of one or two countries. Clearly, despite the significant increase in market value, you have maintained consistency in what you are building and the areas you are focusing on. You know, today as I walked around here, I felt a kind of vitality, just like I felt 15 years ago at Google; you can feel the energy and excitement of the company. Has anything changed during this time? Or is there a difference in how NVIDIA operates, your view of the world, or the size of risks you can take?

Jensen Huang: Well, our company can't change as fast as a stock price. Let's just be clear about that. So in a lot of ways, we haven't changed that much. I think the thing to do is to take a step back and ask ourselves, what are we doing? I think that's really the big observation, realization, awakening for companies and countries: what's actually happening I think what we're talking about earlier, I'm from our industry perspective, we reinvented computing. Now it hasn't been reinvented for 60 years. That's how big of a deal it is that we've driven down the marginal cost of computing, down probably by a million x in the last 10 years to the point that we just, hey, let's just let the computer go exhaustively write the software. That's the big realization. Speaker 2 24:00 And that in a lot of ways, I was kind of, we were kind of saying the same thing about chip design. We would love for the computer to go discover something about our chips that we otherwise could have done ourselves, explore our chips and optimize it in a way that we couldn't do ourselves, right, in the way that we would love for digital biology or, you know, any other field of science.

The speed of change in our company is not as fast as the change in stock prices. So let's put it this way, we haven't changed much in many ways. I think it's important to take a step back and ask ourselves, what exactly are we doing? This is truly a significant observation, realization, and awakening for both the company and the country, which is what is really happening.

Just like we discussed earlier, from our industry's perspective, we have reinvented computing. This hasn't happened in 60 years. We have reduced the marginal cost of computing, probably by a millionth in the last 10 years, to the point where we can now let the computer exhaustively write software. This is a major realization

In many ways, we say the same about chip design. We hope that computers can discover some aspects of our chips on their own, things we could have done ourselves, but computers can explore our chips and optimize them in ways we cannot, just as we hope in digital biology or other scientific fields.

And so I think people are starting to realize when we reinvented computing, but what does that mean even, and as we, all of a sudden, we created this thing called intelligence and what happened to computing? Well, we went from data centers being multi-tenant stores of files. These new data centers we're creating are not data centers. They don't, they're not multi-tenant. They tend to be single-tenant. They're not storing any of our files. They're just, they're producing something. They're producing tokens. And these tokens are reconstituted into what appears to be intelligence. Isn't that right? And intelligence of all different kinds. You know, it could be articulation of robotic motion. It could be sequences of amino acids. It could be, you know, chemical chains. It could be all kinds of interesting things, right? So what are we really doing? We've created a new instrument, a new machinery that in a lot of ways is that the noun of the adjective generative AI So I think people are starting to realize what it really means when we reinvent computing. Suddenly, we have created something called intelligence; what has changed in computing? Well, we used to see data centers as places for multi-tenant file storage. The new data centers we are creating are not traditional data centers in the conventional sense. They are often single-tenant, they do not store our files; they are just producing something. They are producing data tokens. Then these data tokens are recombined into something that looks like intelligence. Right? And intelligence comes in various forms. It could be expressions of robotic movements, amino acid sequences, chemical chains, or all sorts of interesting things, right? So what exactly are we doing? We are creating a new tool, a new mechanism, which in many ways is the nominal form of generative artificial intelligence. You know, not generative artificial intelligence, but an AI factory. It is a factory that produces artificial intelligence. We are doing this on a very large scale. People are starting to realize that this could be a new industry. It generates data tokens, it generates numbers, but these numbers constitute in a way that is fairly valuable, and which industries would benefit from it.

Then you take a step back and you ask yourself again, you know, what's going on? Nvidia on the one hand, we reinvent computing as we know it. And so there's $1 trillion of infrastructure that needs to be modernized. That's just one layer of it The big layer of it is that there's, this instrument that we're building is not just for data centers, which we were modernizing, but you're using it for producing some new commodity. And how big can this new commodity industry be? Hard to say, but it's probably worth trillions. Speaker 2 26:18 And so that I think is kind of the viewers to take a step back. You know, we don't build computers anymore. We build factories. And every country is gonna need it, every company's gonna need it, you know, give me an example of a company who or industry as us, you know what, we don't need to produce intelligence. We got plenty of it. And so that's the big idea. I think, you know, and that's kind of an abstracted industrial view. And, you know, someday people realize that in a lot of ways, the semiconductor industry wasn't about building chips, it was building, it was about building the foundational fabric for society. And then all of a sudden, there we go. I get it. You know, this is a big deal. Isn't not just about chips

Then you take a step back and ask yourself again, what exactly is happening? On one hand, with Nvidia, we are reinventing the computing that we know. So there is a trillion-dollar infrastructure that needs modernization. That's just one layer. The bigger layer is that the tool we are building is not just for data centers; we are modernizing data centers, but you use it to produce some new goods. How big can this new goods industry be? It's hard to say, but it could be worth trillions of dollars.

So I think this is where the audience needs to take a step back. You know, we are no longer making computers. We are making factories. Every country will need it, every company will need it. Give me an example of a company or industry that doesn't need to produce intelligence, you know, we have a lot of intelligence. So that's the big idea. I think, you know, this is an abstract industrial perspective. Then, one day people will realize that in many ways, the semiconductor industry is not about making chips; it's about building infrastructure for society. And then suddenly, we understand. This is not just a big deal about chips.

Host: How do you think about embodiment now?

How do you view the concept of "embodiment" or "concretization" now? That is to say, how do you consider applying intelligence or artificial intelligence to the actual physical world, such as robots or other physical devices?

Jensen Huang: Well, the thing I'm super excited about is in a lot of ways, we're close to artificial general intelligence, but we're also close to artificial general robotics. Tokens are tokens. I mean, the question is, can you tokenize it? You know, of course, tokenizing things is not easy, as you guys know. But if you're able to tokenize things, align it with large language models and other modalities, if I can generate a video that has Jensen reaching out to pick up the coffee cup, why can't I prompt a robot to generate the token, still pick up the rule, you know? I am very excited about the fact that we are close to achieving general artificial intelligence in many ways, and we are also close to realizing general robotics technology. Data tokens are just data tokens. What I mean is, the question is, can you turn it into a data token? Of course, turning things into data tokens is not easy, as you know. But if you can do that, align it with large language models and other methods, if I can generate a video where Jensen reaches for a coffee cup, why can't I prompt a robot to generate a data token to actually pick up that rule, you know? So intuitively, you would think that the problem statement is rather similar for computers. So I think we are very close. That's incredibly exciting.

Now the two brownfield robotic systems. Brownfield means that you don't have to change the environment for self-driving cars. And with digital chauffeurs and body robots right between the cars and the human robot, we could literally bring robotics to the world without changing the world because we built a world for those two things. Probably not a coincidence that Elon spoke about those two forms. So robotics, because it is likely to have the larger potential scale. And so I think that's exciting. But the digital version of it is equally exciting. You know, we're talking about digital or AI employees. There's no question we're going to have AI employees of all kinds, There are currently two types of "brownfield" robotic systems. "Brownfield" means you don't need to change the environment, like autonomous vehicles. With digital drivers and robotic assistants between cars and human robots, we can bring robotic technology to the world without changing the world, because we have built the world for both. Elon Musk may not have mentioned these two forms by chance. So robotics is exciting because it may have greater potential scale. And digital robots are equally exciting. You know, we are talking about digital or AI employees. There is no doubt that we will have various AI employees, and our outlook will be some biologics and some artificial intelligence, and we will prompt them in the same way. Isn't that right? Mostly I prompt my employees, right? You know, provide them context, ask them to perform a mission. They go and recruit other team members, they come back and work going back and forth. How's that gonna be any different with digital and AI employees of all kinds? So we're gonna have AI marketing people, AI chip designers, AI supply chain people, AIs, you know, and I'm hoping that Nvidia is someday biologically bigger, but also from an artificial intelligence perspective, much bigger. That's our future company

Host: We came back and talked to you a year from now, what part of the company do you think would be most artificially intelligent?

If we come back to talk to you a year later, which part of the company do you think will be the most intelligent?

Jensen Huang: I'm hoping it should sign.

I hope the most important and core part of the company can achieve intelligence.

Host: Okay. And most.

Alright, then continue to ask.

Jensen Huang: Important part. And the read. That's right. Because it should start where it moves the needle most also where we can make the biggest impact most. You know, it's such an insanely hard problem. I work with Sasina at Synopsys and Rude at Cadence. I totally imagine them having Synopsys chip designers that I can rent. And they know something about a particular module, their tool, and they train an AI to be incredibly good at it. And we'll just hire a whole bunch of them whenever we need, we're in that phase of that chip design. You know, I might rent a million Synopsys engineers to come and help me out and then go rent a million Cadence engineers to help me out. And that, what an exciting future for them that they have all these agents that sit on top of their tools platform, that use the tools platform and other, I think the most important part should be the place in the company that can have the most impact. He said this issue is very difficult, but he hopes to start the intelligence transformation from the areas that can drive the company's development the most. He is working with Sasina from Synopsys and Rude from Cadence, and he imagines being able to rent Synopsys's chip designer AI. These AIs are very knowledgeable about specific modules and tools and have been trained to be very good at this kind of work. When they need to go through a certain stage of chip design, they would rent a large number of these AI designers. For example, he might rent a million Synopsys engineer AIs to help, and then rent another million Cadence engineer AIs to assist. I think there is an exciting future for us because we have all these AI agents sitting on top of our tool platforms, using these tool platforms, and collaborating with other platforms. Christian from SAP will do this, and Bill will do this as a service.

Now, you know, people say that these SaaS platforms are gonna be disrupted. I actually think the opposite, that they're sitting on a gold mine, that they're gonna be this flourishing of agents that are gonna be specialized in Salesforce, specialized in, you know, well, Salesforce, I think they call Lightning and SAP is about, and everybody's got their own language. Is that right? And we got Kuda and we've got open USD for Omniverse. And who's gonna create an AI agent? That's awesome. At open USD, we're, you know, because nobody cares about it more than we do, right? And so I think in a lot of ways, these platforms are gonna be flourishing with agents and we're gonna introduce them to each other and they're gonna collaborate and solve problems.

Now, some people say that these network-based software service platforms (SaaS) will be disrupted. But I actually think the opposite; they are like sitting on a gold mine, and there will be a prosperous period for specialized intelligent agents (AI). These intelligent agents will be optimized specifically for platforms like Salesforce, SAP, etc. For example, Salesforce has a platform called Lightning, and each platform has its own language and characteristics. We have Kuda, and we also have open USD prepared for Omniverse. Who will create these AI agents? That would be really cool. In terms of open USD, we will do it because no one cares about it more than we do, right? So I think in many ways, these platforms will flourish because of these intelligent agents, and we will introduce them to each other, and they will collaborate and solve problems.

Host: You see a wealth of different people working in every domain in AI. What do you think is under notice or that people that you want more entrepreneurs or engineers or business people could work on?

Do you think there are any overlooked areas in the field of artificial intelligence, or are there areas where you hope more entrepreneurs, engineers, or business people could focus and invest their efforts?

Jensen Huang: Well, first of all, I think what is misunderstood, and I misunderstood, maybe it may be underestimated, is the under the water activity, under the surface activity of groundbreaking science, computer science to science and engineering that is being affected by AI and machinery. I think you just can't walk into a science department anywhere, First of all, I think what may be misunderstood or underestimated are the groundbreaking scientific, computer science, and science and engineering activities that are taking place beneath the surface, which are being influenced by artificial intelligence and machinery. If you walk into any science department, any theoretical math department, you will find that the work of artificial intelligence and machine learning today will change tomorrow. If you consider all the engineers in the world, all the scientists in the world as early indications of the future, because obviously they are, then you will see a tidal wave of gender towards AI, a tidal wave of AI, a tidal wave of machine learning changing everything we do in a short period of time.

In some short period of time. And to work with Alex and Elian and Hinton at in Toronto and Yan LeCun and of course, Andrew Ng here in Stanford. And, you know, I saw the early indications of it and we were fortunate to have extrapolated from what was observed to be detecting cats into a profound change in computer science and computing altogether And that extrapolation was fortunate for us. And now, of course, we were so excited by, so inspired by it that we changed everything about how we did things. But that took how long? It took literally six years from observing that toy, Alex Net, which I think by today's standards will be considered a toy to superhuman levels of capabilities in object recognition. Well, that was only a few years. Speaker 2 33:40 Now what is happening right now, the groundswell in all of the fields of science, not one field of science left behind. I mean, just to be very clear. Okay, everything from quantum computing, the quantum chemistry, you know, every field of science is involved in the approaches that we're talking about. If we give ourselves, and they've been added for a couple to three years, if we give ourselves in a couple, two, three years, the world's gonna change. There's not gonna be one paper, there's not gonna be one breakthrough in science, one breakthrough in engineering, where generative AI isn't at the foundation of it. I'm fairly certain of it And, and so I, I think, you know, there's a lot of questions about, you know, every so often I hear about whether this is a fad computer. You just gotta go back to first principles and observe what is actually happening.

In a very short period of time, we have seen a great wave in the field of science, with no scientific field left behind. What I mean is that everything is very clear. From quantum computing to quantum chemistry, you know, every scientific field is involved in the methods we are discussing. If we give ourselves, say, two or three years, the world will change. There will not be a scientific paper, a scientific breakthrough, or an engineering breakthrough that is not based on generative artificial intelligence. I am quite certain of this. So, I think, you know, there are a lot of questions, and from time to time I hear about whether this is just a temporary trend in computing. You just need to go back to basic principles and observe what is actually happening.

The development of artificial intelligence and machine learning is very rapid and far-reaching. I have had the experience of collaborating with scientists who have made significant contributions in the field of artificial intelligence, such as Alex Krizhevsky from Toronto, Eliasmith, Hinton, and Yan LeCun and Andrew Ng from Stanford. From the simple task of recognizing cats to the superhuman level of object recognition, this process took only a few years. I believe that in the coming years, every scientific and engineering breakthrough in every scientific field will be based on generative artificial intelligence. I encourage people not to doubt whether this is just a passing trend, but to observe what is actually happening and make judgments based on facts.

The computing stack, the way we do computing has changed if the way you write software has changed, I mean, that is pretty cool. Software is how humans encode knowledge. This is how we encode our, you know, our algorithms. We encode it in a very different way. Now that's gonna affect everything, nothing else, whatever, be the same. And so I, I think the, the, I think I'm talking to the converted here and we all see the same thing And all the startups that, you know, you guys work with and the scientists I work with and the engineers I work with, nothing will be left behind. I mean, this, we're gonna take everybody with us again.

The entire system of computation, that is, the way we compute, has changed, and even the way we write software has changed. This means that our method of encoding knowledge has also changed; it is a completely new way of coding. This will change everything, and nothing will be the same as before. He believes he is speaking to those who have already recognized this point, and everyone sees the same trend. Whether it's the startups they collaborate with or the scientists and engineers he works with, everyone will be affected by this transformation. What he means is that this change will lead everyone forward together.

Host: I think one of the most exciting things coming from like the computer science world and looking at all these other fields of science is like I can go to a robotics conference now. Yeah, material science conference. Oh yeah, biotech conference. And like, I'm like, oh, I understand this, you know, not at every level of the science, but in the driving of discovery, it is all the algorithms that are.

One of the most exciting things in the field of computer science is that it can now be applied to all other fields of science. For example, he can go to robotics conferences, materials science conferences, and biotechnology conferences, and he finds that he can understand those contents. Although he may not understand every level of every scientific field, in terms of driving discovery, it is all the algorithms that are at work.

Jensen Huang: General and there's some universal unifying concepts.

Yes, there are some universal unifying concepts

Host: And I think that's like incredibly exciting when you see how effective it is in every domain.

I think this is very exciting when you see how effective algorithms are in every field.

Jensen Huang: Yep, absolutely. And eh, I'm so excited that I'm using it myself every day. You know, I don't know about you guys, but it's my tutor now. I mean, I, I, I don't do, I don't learn anything without first going to an AI. You know? Why? Learn the hard way. Just go directly to an AI. I should go directly to ChatGPT. Or, you know, sometimes I do perplexity just depending on just the formulation of my questions. And I just start learning from there. And then you can always fork off and go deeper if you like. But holy cow, it's just incredible.

I absolutely agree. I'm excited because I'm using AI myself every day. I don't know about you guys, but AI has become my tutor. Now, whenever I learn anything, I first ask AI. Why? Why go through the hard way? Just go directly to AI. For example, he would directly ask ChatGPT, or depending on the question, sometimes he would ask Perplexity. He starts learning from there, and then if he wants, he can dive deeper. Oh my gosh, it's just incredible.

And almost everything I know, I check, I double check, even though I know it to be a fact, you know, what I consider to be ground truth. I'm the expert. I'll still go to AI and check, make double check Yeah, so great. Almost everything I do, I involve it.

I now use AI in almost everything I do. Even if he knows the facts, even if he is an expert in that field, he will still use AI to double-check. He thinks this is great because he involves AI in almost everything.

Host: I think it's a great note to stop on. Yeah, thanks so much for that time today.

This is a great topic to conclude on. Thank you all for participating today, time's up.

Jensen Huang: Really enjoyed it. Nice to see you guys.

I was very happy to see everyone today