GPU 和 AI 数据中心都跟不上，“英伟达亲儿子” 也 “跑不起来”

Emerging cloud service providers face the "growing pains".

For a company with a strong momentum, growing too fast can also bring unexpected troubles.

According to a report from The Information on Friday, CoreWeave, a cloud computing provider supported by NVIDIA, has recently lowered its projected revenue and capital expenditure for this year due to the inability to obtain key equipment such as GPUs as expected.

In April of this year, the company had informed investors that it expected revenue of 630 million yuan for this year, but in June, CoreWeave revised this expectation to slightly over 500 million US dollars.

At the same time, CoreWeave's capital expenditure has also been reduced from 3.3 billion US dollars to 2.3 billion US dollars. The Information pointed out that this may mean that CoreWeave is unable to obtain as many chips as expected, or it is unable to secure enough space in data centers to fulfill its commitments to customers.

CoreWeave was originally a company that primarily leased GPUs to cryptocurrency miners and purchased a large number of NVIDIA GPUs after its establishment. Recently, the company has started leasing GPUs to artificial intelligence and machine learning development companies and received a $100 million investment from NVIDIA earlier this year.

Since the beginning of this year, CoreWeave has raised over 2.7 billion US dollars through debt and equity financing to invest in more chips and data centers. NVIDIA has also included CoreWeave in the first batch of shipments for its advanced artificial intelligence chip H100.

However, due to the excessive demand for NVIDIA H100, production capacity has become an urgent issue to be resolved.

Wallstreetcn previously reported that NVIDIA is expected to ship approximately 550,000 H100 graphics cards worldwide by 2023, but its partner TSMC is currently unable to meet the production demand for computing GPUs. As a foundry, TSMC has been using CoWoS packaging and is working to improve the production capacity of this chip packaging method.

Even though CoreWeave has the advantage of priority shipment from NVIDIA, the limitation of production capacity still prevents it from achieving the expected growth.

Stacy Rasgon, Senior Semiconductor Analyst at Bernstein Research, commented on this:

Even if NVIDIA prioritizes these chips, many other companies also want to purchase their components.

Whenever your growth reaches this level, there will always be growing pains.

Competitive Challenges

CoreWeave faces challenges beyond this.

NVIDIA's H100 is currently mainly sold to US tech companies, but players from the Middle East have also entered the arena. According to the Financial Times, Saudi Arabia has purchased at least 3,000 NVIDIA H100 graphics cards through the public research institution King Abdullah University of Science and Technology (KAUST). The United Arab Emirates will also receive thousands of NVIDIA graphics cards and has developed its own open-source large-scale language model, Falcon, at the state-owned Technology Innovation Institute in Masdar City, Abu Dhabi.

In addition, according to an insider familiar with the company's financial situation, the majority of CoreWeave's revenue currently comes from a few major clients. Another emerging cloud provider, Lambda Labs, which also obtained H100 from NVIDIA, has a longer client list and is expected to reach an equity financing agreement with NVIDIA, although its revenue is lower than that of CoreWeave.

CoreWeave's growth remains impressive, even after lowering its projected revenue by $500 million. It has grown approximately 25 times compared to its revenue of around $25 million in 2022. CoreWeave expects its revenue to reach $2.3 billion next year.

CoreWeave stated that by 2026, the company's cloud service contracts will exceed $7 billion, surpassing the $5 billion earlier this year.

Energy Consumption Challenge

The bottleneck for GPUs lies in the production capacity of the supplier, NVIDIA, while the bottleneck for data centers that host GPUs is the high energy consumption.

Cloud service providers typically purchase or lease data centers to host servers with GPUs. However, NVIDIA's H100 consumes more energy than its earlier chips, with power consumption of up to 700 watts per H100 chip.

Earlier this week, at an industry conference held in San Jose, California, data center operators stated that they are struggling to keep up with the development of artificial intelligence. It is difficult to establish and operate a sufficient number of data centers to accommodate NVIDIA's latest chips and meet the demands of cloud service providers.

Regarding the energy consumption issue, CoreWeave stated that it has obtained approximately 15 megawatts of electricity in three data centers. Three additional data centers are planned to open in 2023, potentially increasing the capacity to around 40 megawatts. The company expects its data center capacity to expand to 70 megawatts next year, enough to accommodate 100,000 H100 chips.

As previously reported by Wall Street CN, the energy consumption generated by training AI models in data centers is three times that of regular cloud workloads. By 2030, the power demand of US data centers is expected to grow at an annual rate of approximately 10%.

AI servers consume 6-8 times more power than regular servers, which also increases the demand for power supplies. General-purpose servers used to require only two 800W server power supplies, but AI servers now require four 1800W high-power power supplies, resulting in a direct increase in server power consumption costs from 3,100 yuan to 12,400 yuan. The surge is tripled.

According to the modeling and forecasting by consulting firm Tirias Research, data center power consumption is expected to reach nearly 4,250 megawatts by 2028, an increase of 212 times compared to 2023. The total cost of data center infrastructure and operations may exceed $76 billion.

The firm stated that the various innovative features brought by generative AI come at the expense of high costs in terms of processing performance and power consumption.

Therefore, although the potential of artificial intelligence may be limitless, physical limitations and costs may ultimately serve as boundaries.