Report: NVIDIA AI chip failure causes Microsoft and other customers to cut orders! Stock price once plummeted nearly 5%
NVIDIA's latest generation AI chip Blackwell encountered technical issues during deployment to data centers, including server rack overheating and chip connection anomalies, leading multiple clients (such as Microsoft, AWS, Google, and Meta) to postpone data center plans and reduce orders. Following the announcement, NVIDIA's stock fell more than 4.7% in early trading on Monday
On January 13th, Monday, Eastern Time, according to The Information, NVIDIA's latest generation AI chip Blackwell encountered technical issues during deployment to data centers, primarily including server rack overheating and chip connection abnormalities.
These problems have hindered the deployment process in data centers, leading several of NVIDIA's clients (including Microsoft, Amazon's AWS, Google, and Meta) to recently cut some orders for the Blackwell GB200 racks.
Due to delayed deliveries, Microsoft's originally planned installation of a large number of GB200s at its Phoenix data center is now filled with H200 chips. Sources revealed that if NVIDIA cannot resolve these issues, its performance may fall below the levels promised by the company.
Following the news, NVIDIA's stock fell more than 4.7% in early trading.
Major Clients Cut Orders and Seek Alternatives
The Blackwell chip has been highly anticipated for its outstanding performance and high energy efficiency. Compared to the previous generation Hopper, Blackwell's energy efficiency has improved fourfold, attracting tech giants like Microsoft, Amazon, Google, and Meta. Each company placed orders worth over $10 billion for this.
However, integrating multiple high-power chips into a single server rack has proven more challenging than expected. Each Blackwell rack is taller than a household refrigerator and weighs nearly as much as a Honda Civic. Due to the extremely high computing density, the racks must use a water cooling system instead of a traditional air cooling system. For most AI developers and data center operators, deploying such specialized racks is a new and complex task. Additionally, not all data centers can meet the environmental requirements for these racks, forcing clients to replan their deployment strategies.
Due to overheating and connection issues, some clients have reduced their orders for the Blackwell GB200 racks. For example, some clients have chosen to wait for an improved version that may be released in the second half of this year, while others plan to procure NVIDIA's older AI chips as alternatives. Although NVIDIA recommends a complete rack solution, some clients may opt to purchase Blackwell chips individually for self-assembly.
Despite the challenges, NVIDIA still has an opportunity to turn things around. If it can resolve these technical issues in a timely manner, clients may reconsider increasing their orders. Furthermore, despite the issues with the racks, the performance of the Blackwell chips still surpasses that of the previous generation, and NVIDIA may find other buyers for the problematic racks.
In November of last year, NVIDIA predicted that the new generation AI chip Blackwell would bring the company billions of dollars in revenue in the first quarter of this year and boost its annual data center chip revenue from $47.5 billion to $150 billion. The high energy efficiency of the Blackwell chips was a key factor in attracting cloud service providers, who hope to achieve higher computing efficiency under fixed energy conditions
Chip Delays Affect Data Center Deployment Plans
According to informed sources, Microsoft, as the server provider for OpenAI, originally planned to install at least 50,000 Blackwell chips in its GB200 racks at a facility in Phoenix. However, due to delays in the delivery of Blackwell chips since last year, OpenAI requested Microsoft to provide the previous generation NVIDIA H200 chips as soon as possible. This change has resulted in the Phoenix data center, which was originally planned to install a large number of GB200s, now being filled with H200 chips.
According to informed sources, Microsoft now plans to install GB200 racks containing 12,000 Blackwell chips at a facility in Phoenix in March this year, which is about a quarter of the initially planned amount. Another person working with Microsoft stated that the company also plans to procure when the GB300 Blackwell racks are launched later this year.
NVIDIA originally planned to start delivering Blackwell racks to customers at the end of last year, but initial delays of three months occurred due to design flaws in the chips. Although NVIDIA has resolved this issue, by November, customers began to worry about overheating problems with the racks. As a result, NVIDIA has repeatedly requested suppliers to change the design.
However, the issues have not been fully resolved. According to three individuals involved in the rack testing, customers have also found inconsistencies in data transmission (i.e., networking) between the chips. These issues could lead to longer setup times for the Blackwell racks than expected, and if NVIDIA cannot resolve these problems, their performance may fall below the levels promised by the company