Wallstreetcn
2024.07.16 06:55
portai
I'm PortAI, I can summarize articles.

GPU, the ultimate winner?

TSMC has received additional wafer orders due to the increased demand for NVIDIA's Blackwell platform GPUs. It is expected that companies such as Amazon, Dell, Google, Meta, and Microsoft will use this architecture GPU to build AI servers, leading to demand exceeding expectations. According to AMD CEO Lisa Su's data forecast, by 2027, the data center AI accelerator market is expected to grow at an annual rate of over 70% to over $400 billion. The market size is expected to reach $45 billion. This information pertains to corporate financial and business-related information

According to Taiwanese media reports, TSMC is preparing to start production of NVIDIA's latest Blackwell platform architecture GPU. Due to strong demand from NVIDIA's customers, TSMC has increased its wafer orders by 25%, which may lead to TSMC raising its profit forecast for this year.

The report cited industry sources indicating that companies such as Amazon, Dell, Google, Meta, and Microsoft will use the Blackwell architecture GPU to build AI servers, leading to demand exceeding expectations.

The positive news for NVIDIA has sparked more thoughts on artificial intelligence, GPUs, and AI chips, but can this momentum continue?

How are GPU sales?

Recently, foreign media nextplatform also made predictions on the sales of AI chips.

Foreign media quoted AMD CEO Lisa Su's data, stating that by 2023, the total potential market size of data center AI accelerators is about $30 billion. By the end of 2027, this market is expected to grow to over $150 billion at a compound annual growth rate of about 50%. However, with the rise of the GenAI trend and the launch of the "Antares" Instinct MI300 series GPU in December, Lisa Su stated that AMD expects the data center AI accelerator market size to reach $45 billion in 2023, and by 2027, this market will grow to over $400 billion at a compound annual growth rate of over 70%.

These figures only apply to accelerators, not servers, switches, storage, and software.

Pierre Ferragu's team at New Street Research has made many outstanding contributions in the technology field. He attempted to analyze the potential market size of the $400 billion data center accelerator and posted this prediction on Twitter:

We still believe this is a very large number, and we expect sales of AI servers, storage, and switches to reach around $1 trillion by the end of the TAM forecast period.

At the beginning of 2024, we obtained GPU sales forecasts from Aaron Rakers, Managing Director and Technical Analyst at Wells Fargo Securities, and performed some spreadsheet operations. The model covers GPU sales in data centers from 2015 to 2022 and estimates sales until the end of 2023 (forecast not yet completed) and extends to 2027. Wells Fargo's model was also revised earlier than AMD's recent revisions, with AMD stating that GPU sales revenue in 2024 will reach $4 billion (we believe it will be $5 billion) Regardless, the model of Bank of America shows that GPU sales will reach $37.3 billion in 2023, with a total shipment of 5.49 million units for the year. Shipments have nearly doubled - including all types of GPUs, not just high-end ones. GPU revenue has increased by 3.7 times. It is predicted that in 2024, data center GPU shipments will reach 6.85 million units, a 24.9% increase, with revenue of $48.7 billion, a 28% growth. In 2027, GPU shipments are forecasted to reach 13.51 million units, driving data center GPU sales to $95.3 billion. In this model, Nvidia's revenue market share in 2023 is 98%, decreasing to only 87% by 2027.

Both Gartner and IDC have recently released some data and forecasts on AI semiconductor sales.

About a year ago, Gartner released a market research report on AI semiconductor sales in 2022, and predicted sales for 2023 and 2027. A few weeks ago, it released a revised forecast report, predicting sales for 2023, and forecasting sales for 2024 and 2028. The second report's market research report also includes some statistical data, which we have added to the table below:

We assume that consumer electronics include personal computers and smartphones, but even Alan Priestly, Vice President and Analyst at Gartner who built these models, knows that by 2026, all chips sold for personal computers will be AI personal computer chips, as all laptop and desktop CPUs will contain some type of neural network processor.

AI chips for accelerating servers are a focus we are following on The Next Platform, with revenue of these chips (excluding the value of accompanying HBM, GDDR, or DDR memory) at $14 billion in 2023, expected to grow by 50% to reach $21 billion by 2024. However, the compound annual growth rate of server AI accelerators from 2024 to 2028 is only around 12%, with sales reaching $32.8 billion. Priestly stated that custom AI accelerators (such as TPUs and Amazon Web Services' Trainium and Inferentia chips) (just two examples) only brought in $400 million in revenue in 2023, and will only bring in $4.2 billion in revenue by 2028 If AI chips account for half of the value of the computing engine, and the computing engine accounts for half of the system cost, then these relatively small numbers added together could bring significant revenue to data center AI systems. Similarly, this depends on where Gartner draws the line and how you think the line should be drawn.

Now, let's take a look at how IDC views the AI semiconductor and AI server market. The company released this interesting chart a few weeks ago:

In this chart, IDC aggregates all revenue from CPUs, GPUs, FPGAs, custom ASICs, analog devices, memory, and other chips used in data center and edge environments. It then deducts revenue from computing, storage, switches, and other devices because these devices are applicable to AI training and inference systems. This is not the value of all systems, but the value of all chips in the system; therefore, it does not include chassis, power supplies, cooling, motherboards, adapter cards, racks, system software, etc. As you can see, this chart includes actual data for 2022 and is still estimating data from 2023 to 2027.

In IDC's analysis, the AI portion of the semiconductor market is projected to grow from $42.1 billion in 2022 to $69.1 billion in 2023, representing a growth rate of 64.1% between 2022 and 2023. This year, IDC believes that AI chip revenue - which not only means XPU sales but also includes all chip content in data center and edge AI systems - will grow by 70% to reach $117.5 billion. If you calculate the numbers between 2022 and 2027, IDC estimates that the total revenue of AI chip content in data center and AI systems will grow at a compound annual growth rate of 28.9% to reach $193.3 billion by 2027.

It seems that GPUs are still the consistent winners, but Raja Koduri, who used to work at Intel, recently published an article analyzing the impact of GPUs.

GPU Has No Rivals?

First, Raja Koduri shared a series of formulas.

Next, he analyzed these formulas step by step:

![](https://wpimg-wscn.awtmt.com/75ea81e5-d06a-4b83-8c97-a7aaf627f1cf.png? On the formula above, Raja Koduri emphasizes that this equation can be applied to CPU architecture, as it has been successful on devices, PCs, and in the cloud. For AI and other floating-point + bandwidth-intensive workloads, GPUs score the highest on this equation - especially CUDA GPUs. The astronomical valuation of NVDA today is a good example of this equation in action.

According to Raja Koduri, ambitious competitors should take note of this equation and determine how your approach aligns with the value of existing enterprises in the workload areas you are targeting.

The sigma in the front represents each workload, as mentioned by Raja Koduri. For different models/workloads, the ratio of floating-point operations, bandwidth, and capacity requirements may vary. Training and inference are examples of generating different ratios.

Raja Koduri also emphasizes not to forget the need for accelerated computing outside the training and inference loop, such as image and voice processing, as well as well-known parallel data analysis and simulation algorithms. The generality of your approach will affect the size of "N," which is significant for CUDA GPUs. For CPUs, N is even larger, but the rest of the equation then comes into play, where their performance weaknesses dominate.

The numerator has 3 parameters: Flops, Bandwidth, and Capacity.

Raja Koduri reiterates that Flops need to be constrained by width (64, 32, 19, 16, 8, 4...) and type (float, int...), and workloads can mix these. Similarly, bandwidth and capacity also have many hierarchical structures - registers, L1, L2, HBM, NVlink, Ethernet, NVME...

Raja Koduri also provides a brief introduction to modern GPU performance optimization strategies in the article.

He mentions that when floating-point programmable shaders were first introduced in GPUs, the ratio of floating-point operations to DRAM bandwidth was 1:1. On the latest GPUs, for 16-bit or even higher, lower precision formats, this range exceeds 300:1. Now, for memory closer to computation (like registers, L1, L2, etc.), this ratio becomes even better. If you look at most of the recent excellent GPU optimization work on transformers, it boils down to minimizing this ratio The more times you use the closed memory layer, the better.

In Raja Koduri's view, other strategies include utilizing underutilized floating-point numbers to run the next ALU-constrained phase asynchronously. Luck and skill play an important role in finding beautiful "overlappable" code blocks that do not disrupt each other's caches.

However, Raja Koduri emphasizes that asynchronous is not for the faint-hearted. Every percentage point improvement in flop utilization in your code could potentially save tens of billions of dollars.

Other common questions include - why don't CPU personnel invest more FLOPS and BANDWIDTH to win the AI war? Are there fundamental architectural limitations?

Raja Koduri states that the simple physics answer to this question is "no". However, introducing more bandwidth into the CPU requires multiple upgrades (and compromises) to the CPU's architectural infrastructure. In general, the compromise is latency. If someone shows you that they can offer higher bandwidth with lower latency, lower power consumption, and cost, then stand up and join their religion.

CPU designers tend to prioritize latency over bandwidth, as they typically judge their workload set based on latency. Products like Intel Sapphire Rapids+HBM provide a good boost in bandwidth but are not enough to challenge GPUs.

Next, let's look at the formulas related to power consumption and cost:

First, looking at power consumption, from the graph, it can be seen that Pj/Flop has not significantly improved in mainstream semiconductor processes. The only game you can play is the definition of flop, where we have reduced it from 64 to 4... now it might be 1.5. Today, FP16's pj/flop is in the range of 0.5-0.7.

For rapid calculations of peta-flop GPUs, 10^15 * (0.7*10^-12) = 700 watts. Calculating power for bandwidth is a bit tricky and may touch on some proprietary information from vendors, so we won't delve into it here.

Pj/Bit is an important (10x) opportunity that architects can leverage. I believe the second half of this decade will see many interesting initiatives, including those around near-memory, memory computing, and co-packaged photonics.

Looking at the cost aspect, the wafer cost for each node is relatively expensive, and memory personnel are also capitalizing on the demand for artificial intelligence.

Raja Koduri states that 10 years ago, he would not have considered "packaging" as a major cost factor, but now it is a big deal. In addition to advanced packaging, costs related to heat and power delivery have also increased significantly Some of the costs are physically reasonable - but a large part of them are ecosystem-driven, trying to support their profit margins under Nvidia's greedy profit margins.

In Raja Koduri's view, using alternative packaging methods can significantly (2-3 times) reduce costs, avoiding expensive active intermediaries + 2.5D/3D stacking. However, it is currently unclear whether they will quickly become beneficial to consumers until the demand and supply of artificial intelligence reach a more reasonable level.

Finally, look at other parts of the formula.

First, look at Compatibility, which involves an interesting GPU history.

Raja Koduri introduced that in 2002, the GPU industry introduced 24-bit floating-point programmable shaders (along with the extraordinary ATI R300) and introduced advanced shading languages (HLSL, GLSL, Cg), which were mainly based on C-based languages with key constraints and extensions. This was a boon for game engine developers, and we witnessed exponential progress in real-time rendering from 2002 to 2012. However, these languages were awkward for general programmers familiar with native C/C++. Therefore, GPUs were mainly limited to game developers.

By 2005, the introduction of high-performance IEEE FP32 sparked the wave of GPGPU - thanks to Stanford alumni like Mike Houston, Ian Buck, who drove early GPGPU languages like Brook and ATI proposed an assembly-level abstraction called CTM (Close to the metal). Although these efforts were great for demonstrations, they did not cross the "compatibility" threshold to gain any serious interest beyond academic research.

CUDA (and the excellent Nvidia G80 architecture) was the first to introduce "pointers" into GPU languages and provide a more comfortable abstraction for C programmers to use GPUs. As they say, rest is history. Pointer and virtual memory support are also key to integrating GPUs as first-class coprocessors into all operating systems. This is an aspect often overlooked in hardware accelerator design, making writing drivers for these accelerators a nightmare for software engineers.

Raja Koduri believes that another aspect of CUDA has not been widely appreciated. As she said, the CUDA programming model is the true abstraction of the NVIDIA GPU HW execution model. Hardware and software are co-designed and tied at the hip Although many CUDA models like SPMD are portable (OpenCL, Sycl, OpenMP, HIP-RocM...), achieving performance portability is almost impossible (unless your architecture precisely replicates the CUDA GPU execution model). Given that programmers working with GPUs prioritize acceleration as their main goal, languages and tools that cannot help you efficiently achieve good performance do not have the appeal of CUDA.

"There is an interesting comparison between CUDA programmers and Python/Pytorch programmers - but that's a discussion for another time," said Raja Koduri.

Raja Koduri acknowledges that CUDA has improved GPU universality to attract C/C++ programmers.

"For the next generation of hardware architects born in the Python era, what will be the next successful software-hardware co-design?" Raja Koduri continued.

Moving on to Extensibility.

Raja Koduri stated that GPU architectures have been incrementally extended many times. I find it surprising that we can still run game binaries built over 20 years ago on modern GPUs. While there have been many advancements at the microarchitecture level, the macro level seems to remain the same. We have added many new data types, formats, instruction extensions, while maintaining compatibility. We even added tensor cores while retaining the SPMD model. This scalability allows GPUs to quickly adapt to new workload trends.

Some experts criticize GPUs as being very "inefficient" for pure tensor mathematics - proposing and building alternative architectures that are incompatible with GPU architectures. However, we are still waiting for one of these architectures to have a meaningful impact.

Looking at Accessibility.

In Raja Koduri's view, this is the most underestimated advantage of GPUs. Your architecture needs to be accessible to a wide range of developers worldwide. In this regard, gaming GPUs are a huge boon for Nvidia. We often see young university students from around the world starting their first GPU acceleration experience with mid-range gaming GPUs like the 3060 in laptops or desktops. Nvidia has done a great job in making its developer SDK accessible on PCs running Windows and Linux.

But Raja Koduri believes that the demand for computation and bandwidth is growing by 3-4 times each year. According to the first principle listed here, CUDA GPU hardware will be disrupted. The only question is "who" and "when"?

When answering readers' questions, Raja Koduri stated that Python and memory are what he believes will overturn CUDA GPUs.

Software Becomes the New Focus

After AMD's recent acquisition of Silo AI, analysts believe that software has become the focus, changing the battlefield of AI chips. Analysts believe that this strategic shift is redefining the AI competition, where software expertise is becoming as important as hardware strength.

Analysts stated that AMD recently acquired Silo AI, the largest private AI lab in Europe, reflecting this trend. Silo AI has rich experience in developing and deploying AI models, especially large language models (LLMs), which is a key area of focus for AMD.

This acquisition not only enhances AMD's AI software capabilities but also strengthens its position in the European market, where Silo AI is renowned for developing culture-related AI solutions.

Neil Shah, Partner and Co-Founder of Counterpoint Research, said, "Silo AI fills the critical capability gap for AMD from software tools (Silo OS) to services (MLOps), helping to customize sovereign and open-source LLMs while expanding its influence in the crucial European market."

AMD had previously acquired Mipsology and Nod.ai, further solidifying its commitment to building a robust AI software ecosystem. With Mipsology's expertise in AI model optimization and compiler technology, along with Nod.ai's contributions to open-source AI software development, AMD now has a comprehensive set of tools and expertise to accelerate its AI strategy.

Prabhu Ram, Vice President of Industry Research at Cybermedia Research, stated, "These strategic initiatives enhance AMD's ability to provide customized open-source solutions for enterprises seeking cross-platform flexibility and interoperability. By integrating Silo AI's capabilities, AMD aims to offer a comprehensive suite for developing, deploying, and managing AI systems, catering to a wide range of customer needs. This aligns with AMD's evolving market position as an accessible and open AI solution provider, leveraging industry trends towards openness and interoperability."

This strategic shift towards software is not limited to AMD. Other chip giants such as Nvidia and Intel are also actively investing in software companies and developing their own software stacks.

Shah mentioned, "If you look at Nvidia's success, you'll find that it's not driven by silicon but by the software (CUDA) and services (NGC with MLOps, TAO, etc.) it provides on its computing platform." "AMD has realized this and has been investing in building software (ROCm, Ryzen Aim, etc.) and service (Vitis) capabilities to offer end-to-end solutions for customers to accelerate the development and deployment of AI solutions Nvidia recently acquired Run:ai and Shoreline.io, both focusing on AI workload management and infrastructure optimization, highlighting the importance of software in maximizing AI system performance and efficiency.

However, this does not mean that chip manufacturers will follow a similar trajectory to achieve their goals. Techinsights semiconductor analyst Manish Rawat pointed out that Nvidia's AI ecosystem has largely been built through proprietary technology and a strong developer community, which has solidified its position in the AI-driven industry.

Rawat added, "AMD's collaboration with Silo AI indicates that AMD is focusing on expanding its capabilities in AI software, competing with Nvidia in the ever-evolving AI field."

Another relevant example is Intel's acquisition of real-time continuous optimization software provider Granulate Cloud Solutions. Granulate helps cloud and data center customers optimize computing workload performance while reducing infrastructure and cloud costs.

The integration of chip and software expertise is not only to catch up with competitors but also to drive innovation and differentiation in the field of artificial intelligence.

Software plays a crucial role in optimizing AI models for specific hardware architectures, improving performance, and reducing costs. Ultimately, software can determine who dominates the AI chip market.

Hyoun Park, CEO and Chief Analyst of Amalgam Insights, stated, "From a broader perspective, AMD is clearly competing with NVIDIA for dominance in the AI field. Ultimately, it's not just about who produces better hardware, but who can truly support the deployment of high-performance, well-managed, and long-term supportable enterprise solutions. Although Lisa Su and Jensen Huang are among the smartest executives in the tech industry, only one of them can ultimately win this war and become the leader in the AI hardware market."

The integration of software expertise with chip company products is driving the emergence of full-stack AI solutions. These solutions cover everything from hardware accelerators and software frameworks to development tools and services.

By providing comprehensive AI capabilities, chip manufacturers can cater to a wider range of customers and use cases, from cloud-based AI services to edge AI applications.

For example, Shah mentioned that Silo AI brings in a wealth of experienced talent, especially dedicated to optimizing AI models, such as the tailored LLM. Silo AI's SIloOS is a very powerful complement to AMD products, allowing its customers to customize AI solutions tailored to their needs using advanced tools and modular software components. This represents a significant advantage for AMD Shah added: "Thirdly, Silo AI has also introduced MLOps functionality, which is a key feature for platform participants, helping their enterprise clients deploy, improve, and operate AI models in a scalable manner. This will assist AMD in developing a service layer on top of software and silicon infrastructure."

The shift from chip manufacturers merely providing hardware to offering software toolkits and services has had a significant impact on enterprise tech companies.

Shah emphasized that these developments are crucial for enterprises and AI developers to fine-tune their AI models to enhance performance on specific chips, applicable to both training and inference stages.

This advancement not only accelerates time to market for products but also helps partners (whether large-scale enterprises or managing internal deployment infrastructure) improve operational efficiency by enhancing energy usage and optimizing code, thereby reducing Total Cost of Ownership (TCO).

"Furthermore, for chip manufacturers, this is a great way to lock these developers into their platforms and ecosystems, profiting from them through software toolkits and services. This can also generate recurring revenue, enabling chip manufacturers to reinvest and increase profits, a model investors appreciate," Shah said.

With the continuous evolution of the AI competition, the focus on software will inevitably intensify. Chip manufacturers will continue to invest in software companies, develop their own software stacks, and collaborate with a broader AI community to build a vibrant and innovative AI ecosystem.

The future of AI lies not only in faster chips but also in smarter software, which can unleash the full potential of AI and transform our lives and work.

In conclusion, do you think the market dominated by GPUs will be disrupted?

Author: Semiconductor Industry Observer, Source: Semiconductor Industry Observer, Original Title: "GPU, Invincible?"