NVIDIA's "Root Cause of Decline": Cutting-edge chips, the stronger the performance, the harder the manufacturing

Wallstreetcn
2024.09.02 02:19
portai
I'm PortAI, I can summarize articles.

The "Ultimate Challenge" of Chip Manufacturing

Author: Gao Zhimou

Editor: Hard AI

If the root cause of Nvidia's "decline" is summarized in one sentence, it is - cutting-edge chips, the stronger the performance, the harder the manufacturing.

On Wednesday, Nvidia reported strong quarterly sales and profits, but also pointed out that the manufacturing challenges of new chips led to a decline in profit margins, with the company setting aside $908 million in provisions in the most recent quarter. As a result, its stock price fell by 6.4% on Thursday.

The company acknowledged in a statement that there are yield issues with the Blackwell architecture GPU, and that a partial redesign of the B200 processor is needed to improve yields, thus delaying the mass production of the next-generation Blackwell architecture GPU to the fourth quarter of 2024:

"We have adjusted the design of the Blackwell GPU to improve production yields. The production plan for Blackwell will start in the fourth quarter and continue through fiscal year 2026.

We expect Blackwell products to generate billions of dollars in revenue in the fourth quarter."

Nvidia did not provide specific details on the exact reasons for the issue. However, analysts and industry executives believe that the engineering challenges mainly stem from the complex manufacturing process brought about by the design of the Blackwell chip.

Analysis indicates that the giant size and complex design of Blackwell have brought unprecedented manufacturing complexity, where any defect in a component could lead to chip scrap, affecting yield and profit. In addition, differences in the thermal expansion coefficients of various parts of the chip could also lead to package warpage, affecting performance and reliability.

To improve yield, Nvidia has made adjustments to the Blackwell design and plans to increase production as scheduled. However, analysts believe that the complexity of using TSMC's new chip interconnect technology, as well as the inherent challenges brought about by the chip size, will still be the main obstacles to mass production of Blackwell.

G. Dan Hutcheson, Vice President of industry analysis firm TechInsights, stated:

"The issue lies in how to make the chips work together and improve yield. When the yield of each part of the chip is not high enough, everything could quickly deteriorate."

1. Complexity of the Blackwell Chip

In order to maintain its leading position in the field of artificial intelligence chips, Nvidia (NVDA) relies on the idea of "bigger is better." However, larger size, while bringing stronger performance, also brings greater manufacturing difficulty.

Nvidia's latest AI chip, Blackwell, described by Jensen Huang as a "very, very large GPU," is indeed the largest GPU in terms of physical size, consisting of two Blackwell wafers spliced together, using TSMC's 4nm process, with 20.8 billion transistors - 2.6 times that of its predecessor.

According to a report by UBS analysts earlier this month, the main issue Nvidia encountered with Blackwell is that the adoption of TSMC's CoWoS-L new packaging method is too complex.

Semianalysis, a professional media outlet in the semiconductor industry, reported that the packaging technology uses RDL intermediate layers with local silicon interconnect (LSI) bridges to connect chiplets, achieving a transmission speed of around 10 TB/s. The precision requirements for placing these bridges are extremely high - any defect in a component could lead to the scrapping of a chip worth $40,000, affecting yield and profit.

Furthermore, due to the mismatch in coefficients of thermal expansion (CTE) between GPU chiplets, LSI bridges, RDL intermediate layers, and mainboard substrates, chip warping and system failures have occurred. To improve yield, Nvidia had to redesign the top metal layer and bumps of the GPU chip, as reported.

Jensen Huang emphasized in a conference call with analysts that no "functional changes" are needed for the Blackwell chip, and all adjustments are made to improve yield.

Chief Financial Officer Colette Kress stated that Nvidia is increasing Blackwell's production as planned, expecting the chip to bring in billions of dollars in revenue for the company in the fourth quarter.

2. "Giant Chip" Strategy

This issue is not unique to Nvidia. Industry insiders say that as chip manufacturers seek to increase processing power by enlarging chip sizes, such problems will become more common. Chip design changes made to eliminate defects or improve yield are also common in the industry.

Lisa Su, CEO of chip giant AMD, pointed out that as chip sizes continue to increase, manufacturing complexity will inevitably rise. The next generation of chips needs to make breakthroughs in energy efficiency and power consumption to meet the huge demand for computing power in AI data centers.

"To make these technologies work, a lot of technical investment is needed," she said. "Will they become more complex and larger? Undoubtedly. This is our reality."

Of course, to break through the size limitations of individual chips, Nvidia combines two of the largest chips to create Blackwell, a move that has also drawn skepticism from competitors.

Andrew Feldman, founder of competitor Cerebras Systems, believes that the difficulty of developing multi-chip combination technology will grow exponentially. Cerebras Systems chose to develop a giant single chip and launched AI cloud computing services based on it, attempting to challenge Nvidia's market position.

Andrew Feldman stated:

"To do meaningful work in the field of artificial intelligence, a large amount of computing power is needed, which requires a large number of transistors, even more than a single chip can accommodate...

Developing dual-chip technology is already difficult, developing quad-chip technology is even more difficult, and developing octa-chip technology is even more challenging."

Whether Nvidia's giant chip strategy will ultimately succeed remains to be seen in the market. However, it is certain that the extreme challenges of chip manufacturing are just beginning