Warden pointed out that although the speed of CPUs in large-scale model training is "laughably slow," the requirements for hardware and workloads are quite different when inference dominates the entire AI budget. The primary task will be to reduce the cost of inference.

Leading the generative AI wave, how long can NVIDIA sit on the iron throne?

Pete Warden, a well-known tech blogger, recently wrote a blog post titled "Why Nvidia's AI Supremacy is Only Temporary." He pointed out that Nvidia's GPU dominance is actually built on the current industry's mainstream demand for LLM training. If inference becomes more mainstream in the future, the winner will shift from Nvidia to CPU manufacturers.

Why can Nvidia win?

Firstly, what factors contribute to Nvidia's current leading position?

Warden summarized four points:

1) The explosion of large-scale model applications has not yet arrived, and the focus of machine learning is currently on training.

Warden pointed out that only a few tech giants have the capability to deploy large language models in practical application scenarios. Most companies are still in the early stages of development and require a large amount of data, hardware, and talent to complete LLM training.

2) Nvidia's competitors don't have a fighting chance.

For developers of large-scale models, Nvidia GPUs are the most straightforward and efficient choice. They are much easier to operate than competing products such as AMD OpenCL, Google TPU, and Cerebras. Nvidia's software stack is more mature and comes with more examples, documentation, and other resources.

Moreover, it is much easier to find engineers familiar with Nvidia GPUs, and they integrate better with all major frameworks. Combined with the synergy of the CUDA platform, Nvidia has achieved a complete dominance.

3) AI researchers prefer to use Nvidia GPUs.

Highly scarce AI talent can be said to have the most say in the current job market.

Warden believes that for these individuals, the Nvidia platform is the most familiar productivity tool. Using other platforms would require more time to get up to speed and limit their productivity. Since the cost of hiring and retaining researchers is extremely high, their preferences are one of the prioritized factors when purchasing hardware.

4) The need for rapid iteration of large-scale models means that manufacturers usually stick to one brand.

If a company developed the previous generation of large-scale models using Nvidia GPUs, it is highly likely that they will continue to use Nvidia for the next generation. The reason is simple: it is the shortest and most efficient choice for iteration. By seamlessly migrating to new hardware, most of the existing code will continue to work, but at a faster speed.

While competitors may have a chance to surpass Nvidia GPUs in terms of performance, Nvidia's CUDA, developed over more than a decade, has already established a complete ecosystem.

According to the global GPU market data report released by Jon Peddie Research, Nvidia ranks first with an 84% market share, while AMD, ranking second, only has 12%. This solid first-mover advantage forms an almost insurmountable moat.

If the Main Theme of AI Shifts from Training to Inference...

However, Warden believes that all of the above factors contribute to NVIDIA's leadership, based on one important premise: training dominates the generative AI wave.

He points out that one day in the future, the computational demand for running models based on user requests will surpass the training cycles. Even if the cost of a single training run is high and the cost of running inference is low, there are so many potential users in the world and so many different applications that the cumulative number of inferences will exceed the total number of training runs.

From this trend, the focus for hardware will shift towards reducing the cost of inference. For user-facing applications, the most important task is to reduce latency.

This is not a strong suit for GPUs, but rather the domain of CPUs.

Warden points out that although CPUs are "laughably slow" when it comes to training large models, the requirements for hardware and workloads are very different when inference dominates the entire AI budget. CPUs have more mature development tools and communities compared to NVIDIA, and the cost per computation is much cheaper than GPUs. More importantly, model weights are fixed and can be easily replicated across a large number of machines during initialization.

Enterprises focus on increasing revenue and reducing costs. More users mean more demand for inference. CPUs have lower inference costs than GPUs, so their demand is bound to surpass GPUs.

Warden predicts that the winners of this shift will be traditional CPU platforms such as Intel x86 and Arm.

When the AI focus shifts from large models to applications, NVIDIA's GPU dominance is not so solid.

Why can Nvidia win?

If the Main Theme of AI Shifts from Training to Inference...