Baidu CEO Robin Li's internal speech on the three major cognitive biases of large models: the future model gap will widen, still far from ideal

Robin Li pointed out in an internal speech that the gap between large models will widen, and open source models do not have advantages in terms of efficiency and cost. He emphasized that intelligent agents are an important direction for the development of large models, and believed that the capabilities and costs of models are multidimensional, requiring continuous iteration to meet user needs. Robin Li also mentioned that it is not important to stay ahead of competitors in time, the key is to control market share

On the afternoon of September 11th, Sina Technology exclusively obtained an internal speech by Baidu's founder, chairman, and CEO Robin Li. In a recent exchange with employees, Robin Li once again discussed the industry's misconceptions about large models, covering topics such as competition among large models, efficiency of open-source models, and trends in artificial intelligence.

Robin Li mentioned that the gap between large models in the future may become larger. He stated that the ceiling for large models is very high, and currently, the distance from the ideal situation is still very far, so models need to continuously iterate, update, and upgrade rapidly; they need to be invested in consistently for years or even decades, continuously meeting user needs, and reducing costs while increasing efficiency.

"The gap between models is multidimensional. One dimension is in terms of capabilities, whether it's understanding, generation, logical reasoning, or memory capabilities, there are differences in these basic abilities. Another dimension is in terms of costs. What is the cost you need to bear to have these capabilities or to answer these questions? Some models may have a slow reasoning speed, and although they achieve the same effect, their actual experience is not as good as the most advanced models," Robin Li said.

Robin Li pointed out that the so-called being ahead by 12 months or behind by 18 months is not that important. In a market environment where every company is in complete competition, no matter what direction you take, there are many competitors. If you can always ensure that you are ahead of your competitors by 12-18 months, that is invincible. Even if you can ensure that you are ahead of your competitors by 6 months, you win. Your market share may be 70%, while your competitors may only have 20% or even 10% of the market share. (Wen Meng)

The following is the content of the internal speech

Q: Some people believe that there are no longer barriers between large models. What is your opinion on this?

Robin Li: I disagree with this statement. I think there are quite a few misunderstandings about large models in the outside world. Every time a new model is released, it definitely wants to say how good it is. Each time, it compares itself with GPT-4o, uses test sets or creates some rankings, saying that its score is almost the same as GPT-4o, or even surpasses it in some aspects. However, this does not prove that these newly released models have narrowed the gap with OpenAI's most advanced models.

The gap between models is multidimensional. One dimension is in terms of capabilities, whether it's understanding, generation, logical reasoning, or memory capabilities, there are differences in these basic abilities. Another dimension is in terms of costs. What is the cost you need to bear to have these capabilities or to answer these questions? Some models may have a slow reasoning speed, and although they achieve the same effect, their actual experience is not as good as the most advanced models.

There is also the issue of overfitting to the test set. Every model that wants to prove its capabilities will participate in rankings. When participating in rankings, it has to guess what others are testing, and what techniques I can use to get the right answers. So, from rankings or test sets, you may think that the capabilities are very close, but there is still a significant gap in actual applications.

The hype from some self-media, combined with the promotional drive of each new model release, creates an impression that the differences in capabilities between models are relatively small, but that's not the case In actual use, I do not allow our technical staff to compete for rankings. What truly measures the capabilities of the Wenxin large model is whether it can meet the needs of users in specific application scenarios and whether it can generate value-added benefits, which is what we truly care about.

We need to see that on one hand, there are still significant differences between model capabilities, and on the other hand, the ceiling is very high. What you achieve today is still very far from what you actually want to achieve, from the ideal state. Therefore, the model needs to continue to iterate, update, and upgrade rapidly.

Even if you see that the gap may not be that big today, will the gap widen in a year? Who can continuously invest in this direction day after day for several years or even more than a decade, making it more and more able to meet user needs, scene requirements, and efficiency improvements or cost reductions? The difference between different models will not become smaller, it will become larger. When they do not know the real needs and only do the test set questions, they may feel that they are almost there.

The so-called leading by 12 months or lagging by 18 months, I don't think it's that important. Every company is in a completely competitive market environment. No matter what direction you take, there are many competitors. If you can always ensure that you are ahead of your competitors by 12-18 months, then you are invincible. Don't think that 12-18 months is a short time. Even if you can ensure that you are always ahead of your competitors by 6 months, then you win. Your market share may be 70%, while your competitors may only have 20% or even 10% of the market share.

Q: Some people say that open-source models are narrowing the gap with closed-source models. Will this destroy the business model of closed-source large model companies?

Robin Li: This question is highly related to the previous question. I just talked about a model, besides its capabilities or effects, also needs to consider efficiency. In terms of efficiency, open-source models are not enough. Strictly speaking, closed-source models should be called commercial models. Commercialized models involve numerous users or customers sharing the same resources, sharing the R&D costs, and sharing the machine resources and GPUs used for inference. On the other hand, open-source models require you to deploy something yourself. After deployment, what is the GPU utilization rate?

Our Wenxin large models 3.5, 4.0, or any version, have a utilization rate of over 90%. How many people are using an open-source model that you deploy? We publicly state that the Wenxin large model is called over 6 billion times a day, and generates over a trillion tokens a day. How many calls or tokens can an open-source model claim in a day? If no one is using it, how is the cost shared? How can the inference cost be compared to commercial models?

Before the era of large models, people were used to thinking that open-source meant free, meant low cost. At that time, commercial products on the market required payment for each version. For example, buying a computer with Windows installed, Microsoft may charge a certain amount for it, while if you run Linux, you don't have to pay this money. Because Linux is open-source, all programmers can see the code. If something is not done well, I can update it, check in after updating, everyone contributes, and you can continuously improve on the shoulders of giants However, these things are not applicable in the era of large models. In the era of large models, people often talk about how expensive GPUs are and how computing power is a key factor in the success or failure of large models. Do open-source models provide computing power for you? They do not provide computing power for you, so how can computing power be efficiently utilized? Open-source models cannot solve this problem.

In the past, when you bought a computer, you were already paying for computing power, but the inference of large models is not like this, this inference is actually very expensive. Therefore, the value of open-source large models lies in the fields of education and research. If you want to understand the working principle of large models, if you don't know the source code, you will definitely be at a disadvantage. However, in the commercial field, when you pursue efficiency, effectiveness, and the lowest cost, open-source models have no advantage.

Question: What is the evolution path of AI applications? Why emphasize intelligent agents?

Robin Li: The development process of large models must go through several stages. At the beginning, it is to assist humans, and in the end, something needs to be checked by humans to ensure that its effect is okay in all aspects before it can be released. This is the Copilot stage. Moving on, it is the Agent intelligent agent. There are various definitions of Agent in the outside world, but the most important thing is that it has a certain degree of autonomy, with the ability to independently use tools, reflect, self-evolve, etc. This level of automation then becomes a so-called AI Worker, capable of doing various mental and physical labor tasks like a human, and can independently complete all kinds of work. There must be such a process.

The judgment that "intelligent agents are the most important development direction of large models" is actually not a consensus. At the Baidu Create conference, we launched three products: AgentBuilder, AppBuilder, and ModelBuilder. Among them, AgentBuilder and AppBuilder are about intelligent agents, one with a lower threshold and the other with more powerful functions.

After we explained it, some people finally began to understand that this thing is indeed interesting, can generate value, and can already produce usable things with relatively low thresholds. It was from that time that the popularity of intelligent agents gradually increased, and many people began to look favorably on the development direction of intelligent agents. However, to this day, intelligent agents are still not a consensus, and there are not many companies like Baidu that regard intelligent agents as the most important strategic and development direction of large models.

Why do we emphasize intelligent agents so much? Because the threshold for intelligent agents is indeed very low. Last year, when we said we wanted to roll out applications and everyone should develop applications, many people still didn't know how to do it, didn't know if this direction could be achieved, what capabilities they needed to generate value in this scenario, there were countless uncertainties, and people didn't know how to turn models into applications.

But intelligent agents provide a very direct, efficient, and simple way. It is quite easy to build intelligent agents on top of models, which is why there are tens of thousands of new intelligent agents created on the Wenxin platform every week today In the field of artificial intelligence, we have seen the trend and have good prerequisites. In addition to the strong capabilities of the models themselves, we also have a good distribution channel.

Baidu's apps, especially Baidu Search, are used by hundreds of millions of people every day. Users actively express their needs to us, indicating which AI can better answer their questions and meet their needs. This is a natural matching process, so we are best able to help these developers distribute their AI.

Sina Technology, original title: "Exclusive | Li Yanhong's internal speech on three major cognitive misunderstandings of large models: the future model gap will widen, still far from ideal."