MiniMax Talks About the Survival Battle of Large Models
Excellent venture companies also have opportunities
Author | Huang Yu
Editor | Liu Baodan
After two years of rapid development, AI large models have entered a phase of elimination, with pessimistic sentiments spreading continuously. When "only a few players can obtain tickets for the second half of the competition" becomes a consensus in the industry, the future of AI large model entrepreneurs is filled with uncertainties.
Recently, Liu Hua, Vice President of MiniMax, and Tian Feng, General Manager of Tencent Cloud's North Region Cloud Native, appeared at a media exchange meeting to discuss the future development of large models.
Regarding the future industry landscape, Liu Hua told Wall Street Insights that considering the strengths in computing power, R&D teams, and funding, there won't be many companies capable of developing the next generation of models. Ultimately, the track for foundational large models will be limited to a handful of companies, while more companies will shift to AI applications. However, he believes that outstanding large enterprises and excellent startups will have opportunities to remain.
At the same time, Liu Hua emphasized that MiniMax will continue to be a technology-driven company, focusing on developing foundational multimodal large models while also pursuing B2B and B2C businesses, "because we still believe that user feedback is the best way to help us improve our models."
There are indeed many voices in the market pessimistic about large model startups. Zhu Xiaohu, Managing Partner of Jinsha River Venture Capital, has stated that the best outcome for the "Six Little Tigers" will be acquisition by large companies.
However, in Liu Hua's view, while domestic large companies have many advantages in developing large models, MiniMax, as a startup, has also received support from some large companies, such as Tencent. Additionally, MiniMax has achieved some successful commercialization.
With support from large companies and successful commercialization paths, Liu Hua believes that startups like MiniMax have a significant chance to remain in the race and continue iterating and developing models.
As a relatively low-profile presence among the "AI Six Little Tigers," MiniMax's commercialization development seems to be relatively smooth.
In late August this year, when releasing the latest video model, Sheng Jingyuan, General Manager of International Business at MiniMax, stated that MiniMax is one of the few companies among all Chinese large model companies that can talk about commercialization, product and model-driven approaches, and is very likely to achieve self-sufficiency and profitability in a relatively short time.
She further pointed out, "The core is still technological breakthroughs; products are a manifestation of technological breakthroughs. This product can ultimately achieve commercialization, which feeds back into subsequent technological investments. This is a true sign of the company turning around. We may currently be halfway up the mountain, but if we succeed, we can quickly reach a positive cycle."
It is worth mentioning that discussions about the slowdown in large model technology iterations and the Scaling Law hitting a wall are becoming increasingly common.
However, Liu Hua pointed out that he has not felt a slowdown in the Scaling Law, and this year it has been found that there is not only a training Scaling Law but also a Scaling Law during the inference phase. In fact, for MiniMax, 2024 will still be a year of rapid development for large models.
The judgment of large model companies regarding technology and direction is particularly important, as it determines their future development limits. MiniMax has three clear directions in large model research and development, including reducing model error rates, achieving infinitely long inputs and outputs, and developing multimodal routes Liu Hua pointed out that in terms of model error rates, the error rate of the previous generation of GPT series models was about 30%, which made GPT unsuitable for some serious production scenarios. If large models are to enter serious production, research and development, scientific research, and design, their error rates must be reduced, ideally to 2%-3%.
Secondly, the tasks of large models are gradually expanding from text to speech and video, and the required amount of tokens is also increasing rapidly. Therefore, MiniMax must ensure that large models handle larger-scale inputs and outputs in an efficient manner.
It is reported that MiniMax's latest developed Abab 7 series model is based on a new architecture of MoE and Linear Attention mechanism, which can significantly reduce the computational complexity of long texts, greatly enhance practicality and response speed, and substantially reduce the training and inference costs of large models.
The rapid iteration of large models also relies on the support of cloud service providers that provide computing power, storage, big data, and other infrastructure. Since its establishment, MiniMax has established a cooperative relationship with Tencent Cloud.
After several years of development, the demand of large model companies for cloud service providers has also changed.
Liu Hua pointed out to Wall Street Journal that initially, MiniMax's demand for cloud providers was mainly focused on model training. As the model's capabilities continue to improve, there will be a need to conduct business through public clouds.
From the perspective of cloud service providers, Tian Feng, General Manager of Cloud Native at Tencent Cloud North District, also shared observations on changes in customer demands over the past two to three years.
Tian Feng pointed out that the early needs of large model company clients like MiniMax were focused on computing power and big data processing, but now there are new demands for data storage, databases, big data, and security. As business develops, the scale of training clusters and inference clusters is rapidly expanding.
"This scale poses a very large and completely new challenge to our networking capabilities and cluster operation and maintenance, which is completely different from before." Tian Feng introduced that Tencent Cloud provides MiniMax with a series of high-performance intelligent computing products that integrate computing, storage, and networking, allowing MiniMax to focus more energy on model training and engineering.
It is reported that Tencent Cloud's high-performance computing cluster can detect network faults within one minute, locate problems within three minutes, and restore the system in as fast as five minutes through a systematic operation and maintenance mechanism. The daily failure rate of its kilowatt cluster has been refreshed to 0.16, which is one-third of the industry average.
At the same time, Tencent Cloud's COS object storage metadata acceleration solution ensures performance, and various refined management measures are implemented for data governance. The DLC data lake product has also been specifically optimized for corpus data preprocessing to enhance task processing performance, helping MiniMax save over 30% of computing power and improve performance by over 35.5%.
"AI large models are a long-distance race; investors and entrepreneurs need to have confidence and patience. Cool technology ultimately needs to be realized in commercialization itself, to make profits and earn money," said Tian Feng The challenges faced in this large model competition are becoming increasingly significant. To become one of the final winners, all participants must find the right direction and then go all out