The Hidden World of China's Computing Power: The A100, which used to cost nearly 100,000 yuan, was once in high demand, but now many cards remain unopened
China's computing power market has experienced explosive growth driven by AI large models, with tech giants such as ByteDance, Alibaba, and Baidu engaging in a computing power "arms race," making large-scale purchases of graphics cards and building computing power clusters. However, there has also been a phenomenon of idle computing power resources in the market, with some graphics cards remaining unopened, and market demand is lukewarm in 2024. Nevertheless, graphics cards aimed at the gaming and consumer markets are still in strong demand
To get rich, one must first build the road.
In order for AI large models to continuously iterate and upgrade, the construction of underlying computing power infrastructure is essential. Since the explosion of ChatGPT in 2022, the computing power market has also experienced explosive growth.
On one hand, China's tech giants are engaged in a computing power "arms race" to seize tickets for the future AGI era, frantically hoarding graphics card resources while also building computing power clusters from thousands to tens of thousands of cards.
According to a report by research firm Omdia, ByteDance ordered approximately 230,000 Nvidia chips in 2024, becoming the second-largest customer in terms of Nvidia procurement volume.
Reports indicate that ByteDance's capital expenditure in 2025 will reach 160 billion yuan, with 90 billion allocated for purchasing AI computing power. Other large companies of similar scale, including Alibaba, Baidu, and China Telecom, are also advancing the construction of computing power clusters at the tens of thousands of cards level.
The frantic infrastructure actions of tech giants are undoubtedly pushing China's AI computing power market to new heights.
However, on the flip side of the giants' aggressive expansion of computing power, a large amount of computing resources in the Chinese market are being left idle, and there are even voices suggesting that "the overall computing power resources in China are in excess supply."
"The computing power market was very hot in 2023, and those dealing with relatively low-performance A100s made money, but the market is much cooler in 2024, with many cards still unopened. However, due to various factors, the demand for the 4090 aimed at the gaming and consumer market remains high," said Wang Wei, CTO of YunZhou Technology ZStack, to GuangCone Intelligence.
In the past two years, the computing power business has been the first track to strike gold in the wave of large models. In addition to Nvidia, countless cloud vendors, PaaS-level computing optimization service providers, and even chip brokers have been rushing in. This surge in computing power demand is primarily driven by the rapid development of AI large models.
The demand for AI is like a pump, activating the computing power market that had been stable for many years and stirring up turbulent waves.
However, now, this driving force has changed. The development of AI large models is gradually shifting from pre-training to inference applications, and more and more players are beginning to choose to abandon the pursuit of ultra-large model pre-training. For example, recently, Li Kaifu, founder and CEO of Zero One Technology, publicly stated that Zero One Technology will not stop pre-training but will no longer chase ultra-large models.
In Li Kaifu's view, pursuing AGI by continuously training ultra-large models also means needing to invest more GPUs and resources. "It's still my previous judgment—when the results of pre-training are no longer better than open-source models, no company should be obsessed with pre-training."
As a result, as one of the "Six Little Tigers" of China's large model startups, Zero One Technology has begun to pivot, betting on the AI large model inference application market in the future.
In this rapidly changing phase of both demand and supply, the market balance is constantly shifting In 2024, the computing power market is experiencing a structural imbalance between supply and demand. Whether the infrastructure for computing power should continue to be developed in the future, where computing resources should be allocated, and how new entrants should compete with giants have become key issues.
A hidden world surrounding the intelligent computing power market is slowly unfolding.
Supply and Demand Mismatch: Low-Quality Expansion Meets High-Quality Demand
In 1997, a young Liu Miao joined IBM, which was then thriving, marking his entry into the computing industry.
In the mid-20th century, IBM's mainframes were dubbed the "Blue Giant," nearly monopolizing the global enterprise computing market.
“At that time, a few IBM mainframes could support the operation of a bank's core business systems nationwide, which made me realize the value of computing in accelerating business systems,” Liu Miao told Guangwei Intelligent.
It was also this experience at IBM that laid the groundwork for Liu Miao's subsequent involvement in the new generation of intelligent computing.
After experiencing the mainframe era represented by CPUs and the cloud computing era, computing power has now entered the intelligent computing era dominated by GPUs, fundamentally changing the entire computing paradigm. After all, if the old architectural solutions are used, a large amount of data needs to be routed through the CPU before reaching the GPU, leading to a waste of the GPU's massive computing power and bandwidth. Moreover, GPU training and inference scenarios have raised higher requirements for high-speed interconnects, online storage, and privacy security.
This has also spurred the development of the upstream and downstream of China's intelligent computing power industry chain, especially in infrastructure construction centered around intelligent computing centers.
At the end of 2022, the release of ChatGPT officially ushered in the era of large AI models, and China entered the "Hundred Model War" phase.
At that time, everyone hoped to provide computing power for pre-training large models, but there was uncertainty in the industry about where the final computing power demand lay and who would use it. “During this phase, everyone prioritized buying cards, hoarding resources,” said Hong Rui, co-founder and president of Turing New Intelligent Computing, which also marked the intelligent computing 1.0 era.
As the training parameters of large models grew larger, it was ultimately discovered that the true consumers of computing power resources were concentrated among players engaged in pre-training.
“The early stage of this round of AI industry explosion aimed to continuously expand computing power consumption in basic model pre-training, exploring the path to AGI (Artificial General Intelligence),” said Hong Rui.
Public data shows that ChatGPT's training parameters have reached 175 billion, with training data of 45TB, generating 4.5 billion words of content daily, requiring at least tens of thousands of NVIDIA GPU A100s to support its computing power, with a single model training cost exceeding 12 million USD.
Additionally, in 2024, multimodal large models are like celestial beings fighting, with training on video, images, and voice data demanding even higher computing power.
Public data indicates that the computing power requirements for training and inference of OpenAI's Sora video generation large model have reached 4.5 times and nearly 400 times that of GPT-4, respectively The report from China Galaxy Securities Research Institute also shows that Sora's demand for computing power is growing exponentially.
Therefore, starting from 2023, in addition to various forces hoarding graphics card resources, the Chinese computing power market has experienced explosive growth to meet the increasing demand for computing power, especially in intelligent computing centers.
Bai Runxuan, a senior analyst at the AI and Big Data Research Center of CCID Consulting, previously stated: "Starting from 2023, local governments have increased their investment in intelligent computing centers, promoting the development of infrastructure."
Under the dual influence of the market and policies, China's intelligent computing centers have rapidly emerged like bamboo shoots after a rain in just one or two years.
This includes government-led construction projects, as well as intelligent computing centers primarily invested in and constructed by companies such as Alibaba Cloud, Baidu Intelligent Cloud, and SenseTime, along with some cross-industry companies that have seen opportunities and entered this field.
At the same time, startups like Turing New Intelligence, Qujing Technology, and Silicon-based Flow have also entered the computing power industry.
Relevant data shows that by the first half of 2024, there are already more than 250 intelligent computing centers built or under construction in China, with 791 related bidding events in the first half of 2024, a year-on-year increase of 407.1%.
However, the construction of intelligent computing centers is not simply about building bridges and roads; it has high requirements for technology and professionalism, often mismatched construction and demand, and insufficient ongoing planning.
In Liu Miao's view, intelligent computing centers are actually a unique product of China, which to some extent undertake a social mission to support local industrial development, but the problem with this not being purely market-driven is that after a construction cycle of 12-24 months, "once built, they become idle because they can no longer meet the computing power demand of the industry two years later."
Currently, it is indeed observed that resources in China's computing power market are idle in certain areas. "The root cause of the current issues in China's computing power market lies in its excessive roughness," Liu Miao said.
However, the market cannot simply be described as having excess supply or insufficient demand; in fact, it is a mismatch between the supply and demand for computing power. That is, the demand for high-quality computing power is far from sufficient, while low-quality computing power supply cannot find much market demand. After all, players in large model pre-training often require computing power resource pools of over ten thousand cards.
However, the scale of some early intelligent computing centers in the Chinese computing power market "may only have dozens to a couple of hundred machines, which is far from enough for the pre-training of current foundational models, but the equipment selection matches the pre-training demand," Hong Rui stated. From the perspective of pre-training, computing power is indeed scarce, but the computing power that cannot be used due to insufficient scale becomes idle
Differentiation in the Large Model Track: A Quiet Shift in Computing Power Demand
The development of the large model market is changing rapidly.
Originally, during the pre-training phase of large models, industry players hoped to improve the performance of large models through continuous training. If one generation did not succeed, they would invest more computing power and funds to train the next generation of large models.
"The previous logic of development in the large model track was like this, but by around June 2024, it became evident in the industry that the pre-training of large models had reached a critical point of input-output. Investing massive resources in pre-training may not yield the expected returns," said Hong Rui.
A significant reason behind this is the issue of 'OpenAI's technological evolution. The capabilities of GPT-3.5 were impressive, and while GPT-4 showed improvements, from mid-2023 to 2024, the overall upgrade of foundational model capabilities did not reach the effectiveness of 2023, with further enhancements mainly on the CoT and Agent sides,' Wang Wei stated.
While the upgrade of foundational model capabilities is slowing down, the cost of pre-training remains extremely high.
Previously, Li Kaifu, founder and CEO of Zero One Technology, mentioned that the cost of a single pre-training session is about three to four million dollars. For most small and medium-sized enterprises, this is undoubtedly a significant cost investment. "The survival strategy for startups is to consider how to make the best use of every dollar, rather than burning more GPUs."
As a result, with the parameters of large models becoming increasingly large, more and more companies cannot afford the training costs of large models and can only apply or fine-tune based on already trained models. "It can even be said that when the parameters of large models reach a certain level, most companies do not even have the capability for fine-tuning," Hong Rui said.
Relevant data indicates that in the second half of 2024, nearly 50% of large models that have gone through filing have shifted towards AI applications.
The transition of large models from pre-training to inference applications has undoubtedly brought about a differentiation in the demand for computing power in the market. Hong Rui believes: 'The computing centers and power demands for large model pre-training and inference applications are actually two separate tracks.'
From the perspective of large model pre-training, the required computing power is proportional to the model parameter size and training data volume. The overall requirements for computing clusters are: 100 billion parameters require 100 cards, 1 trillion parameters require 1,000 cards, and 10 trillion parameters require 10,000 cards.
Additionally, an important characteristic of large model pre-training is that it cannot be interrupted; once interrupted, all training must start over from the CheckPoint.
"Since last year, a large number of intelligent computing devices have been introduced domestically, but the average failure rate is around 10%-20%. Such a high failure rate leads to interruptions in large model training every three hours," Liu Miao said. "A 1,000-card cluster basically needs to be interrupted every 20 days."
At the same time, to support artificial intelligence in moving towards the Agent era and even future general artificial intelligence, it is necessary to continuously expand computing clusters, transitioning from 1,000-card clusters to 10,000-card clusters or even 100,000-card clusters. "Elon Musk is a remarkable person; he planned a 100,000-card cluster in Memphis, with the first 19,000 cards taking only 19 days from installation to activation, a complexity far exceeding existing projects." "Liu Miao said.
(Musk previously announced the activation of a 100,000-card Memphis supercluster on X)
Currently, in order to meet the training needs of larger parameter models, domestic companies are actively investing in 10,000-card computing power pools, but "everyone will find that the customers of computing power suppliers are actually concentrated in a few leading enterprises, and they will require these enterprises to sign long-term computing power leasing agreements, regardless of whether you actually need this computing power." said Liu Jingqian, chief expert of large models at China Telecom and head of the large model team.
However, Hong Rui believes; "In the future, there will be no more than 50 players globally that can truly have the strength to do pre-training, and as the scale of intelligent computing clusters reaches 10,000 cards and 100,000 cards, the number of players capable of cluster operation and maintenance troubleshooting and performance tuning will also decrease."
At this stage, a large number of small and medium-sized enterprises have shifted from pre-training large models to AI inference applications, and "many AI inference applications are often short-term, tidal applications." Liu Jingqian said. However, when deployed in actual terminal scenarios, a large number of servers will be needed for parallel network computing, and inference costs will suddenly increase.
"The reason is that the latency is relatively high; a large model requires deep reasoning to answer a question, and during this time, the large model is continuously computing, which also means that the computing resources of this machine are monopolized for dozens of seconds. If expanded to hundreds of servers, it will be difficult to cover the inference costs." said Ai Zhiyuan, CEO of Qujing Technology, to Guangkui Intelligence.
Therefore, compared to AI (large model) training scenarios that require large-scale computing power, AI inference does not have as strict performance requirements as AI training, mainly needing to meet low power consumption and real-time processing demands. "Training is concentrated in areas with high electricity, while inference needs to be close to users." said Yue Kun, Vice President of Huawei and President of the ISP and Internet Systems Department, noting that the latency for inference computing power should be within the range of 5-10 milliseconds, and high redundancy design is needed to achieve the construction of "two locations and three centers."
Taking China Telecom as an example, it has currently established 10,000-card resource pools in Beijing, Shanghai, Guangzhou, Ningxia, and other places, and to support the development of industry models, it has also established 1,000-card resource pools in seven locations including Zhejiang and Jiangsu. At the same time, to ensure that AI inference applications have low latency within the 10-millisecond range, China Telecom is also building edge inference computing power in multiple regions, gradually forming a national "2+3+7" computing power layout.
2024 is referred to as the year of AI application landing, but in reality, the AI inference application market has not exploded as expected. The main reason is that "currently there is no application in the industry that can be widely deployed in enterprises, after all, the technical capabilities of large models still have defects, the foundational models are not strong enough, and there are issues such as hallucinations and randomness." said Hong Rui.
Due to the general lack of an explosion in AI applications, the growth of inference computing power has also stagnated. However, many practitioners remain optimistic—they believe that intelligent computing power will still be "long-term short supply," and as AI applications gradually penetrate, the growth in demand for inference computing power is a certain trend A chip industry insider told Guangcone Intelligence that AI inference is actually a continuous attempt to pursue the optimal solution. Agents consume more tokens than ordinary large language models (LLMs) because they are constantly observing, planning, and executing. "o1 is an internal attempt by the model, while the Agent is an external attempt by the model."
Therefore, "we estimate that there will be a massive demand for AI inference computing power next year," said Liu Jingqian. "We have also established a large number of lightweight intelligent computing cluster solutions and comprehensive edge inference solutions."
Wang Wei also stated, "If the number of cards in the computing power pool is not large, it is difficult to rent out pre-trained cluster computing power. The number of training cards required for the inference market is not much, and the entire market is still growing steadily, with demand from small and medium-sized internet companies continuing to increase."
However, at this stage, training computing power still occupies the mainstream. According to the "2023-2024 China Artificial Intelligence Computing Power Development Assessment Report" jointly released by IDC and Inspur Information, in 2023, the workload of domestic AI servers was approximately 60% training and 40% inference.
In August 2024, NVIDIA's management stated during the Q2 2024 earnings call that over the past four quarters, inference computing power accounted for about 40% of NVIDIA's data center revenue. In the future, revenue from inference computing power will continue to rise. On December 25, NVIDIA announced the launch of two GPUs, GB300 and B300, to meet the performance needs of large model inference.
Undoubtedly, the transition of large models from pre-training to inference applications has driven the differentiation of demand in the computing power market. From the perspective of the overall computing power market, intelligent computing centers are still in the early stages of development, and infrastructure construction is not yet complete. Therefore, large pre-training players or large enterprises are more inclined to hoard graphics cards. For AI inference application tracks, when intelligent computing centers provide equipment leasing, most small and medium-sized clients prefer zero rental and pay more attention to cost-effectiveness.
In the future, as the penetration rate of AI applications continues to rise, the consumption of inference computing power will also continue to increase. According to IDC's forecast, by 2027, the share of inference computing power in the intelligent computing power market will even exceed 70%.
How to reduce the cost of inference deployment by improving computing efficiency has become the key to the development of the AI inference application computing power market.
Not blindly stacking cards, how to improve computing power utilization?
Overall, since the official launch of the "East Data West Computing" initiative in 2021, the Chinese market has not lacked underlying computing power resources. In fact, with the development of large model technology and the growth of computing power demand, the trend of large-scale infrastructure purchases in the computing power market will continue for another year or two.
However, these underlying computing power resources share a common characteristic: they are scattered everywhere and have small scales. Liu Jingqian stated, "Each location may only have about 100 or 200 computing units, which is far from meeting the demand for large model computing power."
Moreover, more importantly, the current computing efficiency of computing power is not high There are reports indicating that even OpenAI has only achieved a computing power utilization rate of 32%-36% during the training of GPT-4, with the effective utilization rate of computing power for large model training being less than 50%. "The utilization rate of computing power in our country is only 30%," admitted Wu Hequan, an academician of the Chinese Academy of Engineering.
The reason lies in the fact that during the training cycle of large models, GPU cards cannot always achieve high resource utilization, and there are periods of resource idleness during smaller training tasks. In the model deployment phase, due to business fluctuations and inaccurate demand forecasting, many servers often remain in standby or low-load states.
"The overall development of CPU servers in the cloud computing era has become very mature, with the availability requirements for general computing cloud services being 99.5%-99.9%, but it is very difficult for large-scale GPU clusters to achieve this," said Hong Rui.
Behind this is the insufficient overall hardware development of GPUs and the entire software ecosystem. Software-defined hardware is gradually becoming a key factor in the development of intelligent computing power.
Therefore, in the realm of intelligent computing power, various players are entering the market based on their core advantages, focusing on the construction of intelligent computing power infrastructure, integrating idle social computing resources, and improving computing efficiency through software algorithms.
These players can be roughly divided into three categories:
One category consists of large state-owned central enterprises, such as China Telecom, which can better meet the computing power needs of state-owned enterprises due to their central enterprise status.
On one hand, China Telecom has built computing resource pools of thousands, tens of thousands, and hundreds of thousands of cards. On the other hand, through the Xirang·Intelligent Computing Integrated Platform, China Telecom is actively integrating idle social computing resources, enabling unified management and scheduling across service providers, regions, and architectures, thereby improving the overall utilization of computing resources.
"What we are doing first is the intelligent computing scheduling platform for state-owned central enterprises, integrating over 400 different idle computing resources from society onto the same platform, and then connecting to the computing power needs of state-owned central enterprises to solve the imbalance between supply and demand of computing power," said Liu Jingqian.
Another category consists of cloud vendors primarily from internet companies, including Alibaba Cloud, Baidu Intelligent Cloud, and Volcano Engine. These cloud vendors are actively transitioning from CPU cloud to GPU cloud in their underlying infrastructure architecture, forming a full-stack technical capability centered around GPU cloud.
"In the next decade, the computing paradigm will shift from cloud-native to a new era of AI cloud-native," said Tan Dai, president of Volcano Engine. AI cloud-native will optimize computing, storage, and network architecture with GPU at its core, allowing GPUs to directly access storage and databases, significantly reducing IO latency.
From the perspective of underlying infrastructure, the construction of intelligent computing centers is often not dominated by a single brand of GPU graphics cards; it is more likely to involve a combination of NVIDIA and domestic GPU graphics cards. There may even be heterogeneous computing scenarios where various types of computing units, such as CPU, GPU, FPGA (programmable chips), and ASIC (chips designed for specific scenarios), work together to meet computing demands in different scenarios Maximizing the effectiveness of computing power.
Therefore, cloud vendors have also focused on upgrading their capabilities for "multi-core mixed training." For example, in September this year, Baidu Intelligent Cloud fully upgraded its Baige AI heterogeneous computing platform to version 4.0, achieving 95% efficiency in multi-core mixed training on a scale of tens of thousands of cards.
In addition to the performance of GPU graphics cards, the deployment of large model training and inference applications is also closely related to the underlying infrastructure, including networks, storage products, databases, and other software toolchain platforms. The improvement in processing speed often requires multiple products to work together to accelerate completion.
Of course, besides the major cloud providers, there are also a number of small and medium-sized cloud vendors that have entered the computing power industry from their differentiated perspectives, such as YunZhou Technology—focused on scheduling and managing computing power resources based on platform capabilities.
Wang Wei admitted, "Previously, GPUs were just accessories in business system architecture, and only gradually became a separate category later on."
In August this year, YunZhou Technology launched the new generation AI Infra infrastructure ZStack AIOS platform, which mainly focuses on AI enterprise applications, helping enterprise customers deploy new large model applications from three directions: "computing power scheduling, AI large model training and inference, and AI application service development."
"We will use the platform to aggregate the specific usage of computing power, perform operation and maintenance on computing power, and in scenarios where GPU graphics cards are limited, we will also segment computing power for customers to improve utilization," Wang Wei said.
In addition, in operator scenarios, there are many resource pools for computing power. "We will also cooperate with customers to help them operate resource pools, computing, and unified operation management," Wang Wei stated.
Another type of player consists of startups that enhance computing efficiency through algorithms, such as Turing New Intelligent Computing, Qujing Technology, and Silicon-based Flow. These new players are far weaker in overall strength compared to major cloud providers, but they are gradually carving out a place in the industry through breakthrough single-point technologies.
"Initially, we were a service provider for intelligent computing cluster production, and in the connection phase, we became a computing power operation service provider, evolving into an intelligent data and application service provider in the future. These three roles are continuously evolving," Liu Miao said. "So our positioning is as a new generation computing power operation service provider."
Turing New Intelligent Computing hopes to build an independent platform to integrate idle computing resources, enabling scheduling, renting, and services for computing power. "We are creating a resource platform that connects idle computing power, similar to the early Taobao platform," Liu Miao said, noting that idle computing power mainly connects with intelligent computing centers in various regions.
In comparison, companies like Qujing Technology and Silicon-based Flow focus more on the AI inference application market and emphasize using algorithmic capabilities to enhance computing efficiency and reduce the costs of large model inference applications, although each company's approach varies.
For instance, to address the impossible triangle of large models—balancing effectiveness, efficiency, and cost—Qujing Technology proposed a full-system heterogeneous collaborative inference and RAG (Retrieval-Augmented Generation) scenarios for AI inference applications, employing two major innovative technical strategies of "exchanging storage for computing" to release storage capacity as a supplement to computing power, reducing inference costs by ten times and response latency by twenty times Looking to the future, in addition to continuously optimizing the underlying computing power resources and the upper application of the AI infrastructure layer, "what we hope for is a model where we build a framework, and the applications on the roof are developed by everyone, allowing us to better reduce costs," said Ai Zhiyuan, founder and CEO of Qujing Technology.
It is clear that Qujing Technology does not just want to be a supplier of algorithm optimization solutions, but also aims to be a service provider for the implementation of large AI models.
Additionally, in the current industry, optimization solutions for large model computing power often prioritize improving GPU utilization. Ai Zhiyuan stated that the current GPU utilization has already reached over 50%, and increasing GPU utilization further is very challenging.
"There is still a lot of room for improvement in GPU utilization, but it is very difficult, involving technologies such as chips, video memory, inter-card communication, multi-machine communication, and software scheduling. This cannot be solved by one company or one technology; it requires the entire upstream and downstream of the industry chain to work together," Hong Rui also said to Guangkui Intelligent.
Hong Rui believes that the industry currently lacks the true capability to network and operate ultra-large-scale intelligent computing clusters from a technical perspective, and the software layer has not matured. "The computing power is there, but if the software optimization is not done well, or if the inference engine and load balancing are not done well, it will have a significant impact on computing power performance."
Looking at these three types of players, whether they are operators like China Telecom, cloud vendors, or new entrants, each has a different approach to entering the computing power market, but all hope to share in the global computing power feast.
In fact, compared to large model services, this is indeed a business with stronger certainty at this stage.
Homogenization of computing power leasing, refined and specialized operational services are key
From the stability of making money, it's hard for gold miners to compete with water sellers.
AI large models have been racing for two years, but in the entire industry chain, only computing power service providers led by Nvidia have truly made money, gaining both fame and fortune in revenue and the stock market.
In 2024, the dividends of computing power are gradually extending from Nvidia to the broader computing power track, with server manufacturers, cloud vendors, and even players reselling and leasing various cards also receiving certain profit returns. Of course, the profits are far less than those of Nvidia.
"In 2024, overall, we didn't lose money, but we didn't make a lot either," Wang Wei admitted. "AI (applications) have not yet scaled up at this stage; the largest volume related to AI is still at the computing power level, and computing power application revenue is relatively good."
Regarding the development expectations for 2025, Wang Wei also frankly stated that he is not fully prepared to make predictions, "Next year is really hard to say, but in the long term, over the next three years, AI applications will see significant incremental progress." However, looking at the development of computing centers in various regions, few have been able to achieve revenue, with the basic goal being to cover operating costs.
According to Yue Yuanhang, CEO of Zhibole Technology, after calculations, it was found that even if the equipment rental rate of a computing center rises to 60%, it would still take at least more than 7 years to break even.
Currently, computing centers mainly generate revenue by providing computing power leasing, but “equipment leasing is very homogeneous, and what is truly lacking is an end-to-end service capability.” Hong Rui said to Guangkui Intelligent.
The so-called end-to-end service capability means that in addition to hardware, computing centers must also be able to support enterprises from large model application development to large model iteration and upgrade, and then to subsequent large model deployment with full-stack services. Currently, there are relatively few vendors that can truly achieve this end-to-end service.
However, from the overall data, the development prospects of China's computing service market are becoming increasingly optimistic. According to the latest report released by IDC titled "China Computing Service Market (First Half of 2024) Tracking," the overall market for computing services in China is expected to grow by 79.6% year-on-year in the first half of 2024, with a market size reaching 14.61 billion yuan. “The computing service market is growing at a rate far exceeding expectations. From the growth trend of computing services, the computing service market will continue to maintain rapid growth in the next five years,” said Yang Yang, research manager of IDC's China Enterprise Research Department.
Hong Rui also stated that after experiencing the crazy hoarding of card resources in the computing 1.0 era, and the extensive expansion of computing centers leading to supply-demand imbalance in the computing 2.0 era, the endgame of the computing 3.0 era will definitely be specialized and refined operational computing services.
After all, when pre-training and inference are divided into two tracks, the AI inference application market will gradually develop, the technology stack will gradually mature, service capabilities will gradually improve, and the market will further integrate scattered idle computing resources to maximize computing power utilization.
However, the current computing power market in China still faces significant challenges. While there is a shortage of high-end GPU chips, “the domestic GPU market is currently too fragmented, and each GPU has its own independent ecosystem, leading to a fragmented overall ecosystem.” Wang Wei said, which also results in very high adaptation costs for the entire domestic GPU ecosystem.
But as Liu Miao said, the 20-year long cycle of computing has just begun, and we may only be in the first year. The path to achieving AGI is also filled with uncertainties, which undoubtedly presents more opportunities and challenges for many players.
Author of this article: Bai Ge, Source: Tencent Technology, Original Title: "The Hidden World of China's Computing Power: The A100 that used to cost nearly 100,000 yuan was crazily sought after, and now many cards are still unopened."
Risk Warning and Disclaimer
The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial conditions, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investing based on this is at your own risk