The true LLM Agent

This article explores the development direction of future AI intelligences, emphasizing the importance of the model itself rather than the workflow. The author, Alexander Doria, believes that workflow agents based on pre-arranged prompts will encounter bottlenecks in the long term, and future agents should combine reinforcement learning and reasoning, enabling them to autonomously manage task execution. The article points out that AI models themselves will become the products of the future, and current research and market trends are also driving this direction

Last week, I really didn't want to study due to the drop, but I strongly recommend this article to everyone; it's worth reading during the spring outing. Alexander Doria's thoughts on Agents, translated by Baoyu AI and formatted by Founder Park.

Alexander's viewpoint is clear: the future development direction of AI intelligent entities must focus on the model itself, rather than the workflow. He uses the currently popular Manus as a case study: he believes that workflow agents like Manus, which are based on "pre-arranged prompts and tool paths," may perform well in the short term, but will inevitably encounter bottlenecks in the long term. This "prompt-driven" approach cannot scale and cannot truly handle complex tasks that require long-term planning and multi-step reasoning.

The next generation of true LLM intelligent entities will be realized through the "combination of reinforcement learning (RL) and reasoning." The article cites OpenAI's DeepResearch and Anthropic's Claude Sonnet 3.7 as examples, indicating that future intelligent entities will autonomously control the entire process of task execution, including dynamic planning search strategies and actively adjusting tool usage, rather than relying on external prompts or workflow drives. This shift means that the core complexity of intelligent entity design will move to the model training phase, fundamentally enhancing the model's autonomous reasoning capabilities and ultimately completely overturning the current application layer ecology.

The Model is the Product

In recent years, people have been speculating about the next round of AI development directions: will it be agents? Reasoning models? Or true multimodality?

But now, it's time to draw a conclusion:

AI models themselves are the future products.

Currently, both research and market development trends are pushing in this direction.

Why do I say this?

The expansion of general models has encountered bottlenecks. The biggest message conveyed when GPT-4.5 was released is that the improvement in model capabilities can only grow linearly, while the required computing power is skyrocketing exponentially. Despite OpenAI's significant optimizations in training and infrastructure over the past two years, it still cannot launch such super giant models at an acceptable cost.
The effects of opinionated training far exceed expectations. The combination of reinforcement learning and reasoning capabilities is enabling models to quickly master specific tasks. This capability is neither traditional machine learning nor basic large models, but rather a magical third form. For example, some extremely small-scale models suddenly become astonishingly powerful in mathematical abilities; programming models no longer just generate code simply but can even autonomously manage entire codebases; and Claude, with almost no specialized training and relying on a very sparse information environment, can surprisingly play Pokémon
The cost of inference is rapidly decreasing. DeepSeek's latest optimization results show that the current global GPU resources are sufficient to support every person on Earth calling upon ten thousand top models' tokens daily. However, in reality, there is simply not such a large demand in the market. The simple model of making money by selling tokens is no longer viable; model providers must move up the value chain.

But this trend also brings some awkwardness, as all investors have pinned their hopes on the "application layer." However, in the next phase of the AI revolution, the application layer is very likely to be the first to be automated and disrupted.

The Form of Next-Generation AI Models

In the past few weeks, we have seen two typical cases of "model as product": DeepResearch launched by OpenAI and Claude Sonnet 3.7 launched by Anthropic.

Regarding DeepResearch, many people have misunderstandings, which have become more severe with the emergence of numerous imitation versions (both open-source and closed-source). In fact, OpenAI did not simply wrap a layer around the O3 model but trained an entirely new model from scratch *.

*OpenAI's official documentation: https://cdn.openai.com/deep-research-system-card.pdf

This model can complete search tasks internally without the need for external calls, prompts, or human process intervention:

"The model has autonomously mastered core web browsing capabilities (such as searching, clicking, scrolling, and understanding documents) through reinforcement learning... It can also reason autonomously, synthesizing information from numerous websites to directly find specific content or generate detailed reports."

DeepResearch is not a standard large language model (LLM), nor is it an ordinary chatbot. It is a completely new research language model, specifically designed to complete search-related tasks end-to-end. Anyone who has seriously used this model will find that the reports it generates are longer, well-structured, and the information analysis process behind the content is extremely clear.

In contrast, as Hanchung Lee pointed out *, other DeepSearch products, including Perplexity and Google's version, are merely ordinary models with a bit of extra tricks added:

*https://leehanchung.github.io/blogs/2025/02/26/deep-research/

"Although Google's Gemini and Perplexity's chat assistants also claim to provide 'deep search' functionality, they have neither disclosed detailed optimization processes nor provided truly substantial quantitative evaluations... Therefore, we can only speculate that their fine-tuning work is not significant Anthropic's vision is becoming increasingly clear. Last December, they provided a rather controversial, but I believe quite accurate, definition of an "agent"*. Similar to DeepSearch, a true agent must independently complete tasks internally: "An agent can dynamically decide its execution process and how to use tools, autonomously controlling the completion of tasks."

*Anthropic's definition: https://www.anthropic.com/research/building-effective-agents

However, most so-called agent companies on the market are not actually creating agents, but rather "workflows":

That is, they use predefined code paths to link LLMs with other tools. These workflows still have some value, especially in specific vertical applications. But for those truly engaged in cutting-edge research, it is clear: the real breakthroughs in the future must come from directly redesigning AI systems at the model level.

The release of Claude 3.7 is a concrete proof of this: Anthropic specifically trained on complex programming tasks as the core objective, significantly improving the performance of many products that originally used workflow models (like Devin) in software development (SWE) related evaluations.

Another smaller-scale example from our company Pleias:

We are currently exploring how to fully automate RAG (Retrieval-Augmented Generation systems).

The current RAG systems are composed of many complex but fragile processes: request routing, document splitting, reordering, request interpretation, request expansion, source context understanding, search engineering, etc. However, with advancements in model training techniques, we find it entirely possible to integrate these complex processes into two interrelated models:

One dedicated to data preparation, and the other focused on search, retrieval, and report generation. This solution requires designing a very complex synthetic data pipeline and a completely new reinforcement learning reward function.

This is true model training, true research.

What does all this mean for us?

It means a shift in complexity.

By preemptively addressing a large number of possible actions and various extreme situations during the training phase, deployment will become exceptionally simple. But in this process, the vast majority of value will be created by the model training party and ultimately captured by them.

In simple terms, Anthropic aims to disrupt and replace the current so-called "agent" workflows, such as typical systems like llama index:

Transitioning to this fully modeled solution:

The Honeymoon Period Between Model Providers and Application Developers is Over

The current trend of AI has become clear:

Within the next 2-3 years, all closed-source AI large model providers will stop offering API services to the outside world and will instead provide the models themselves as products.

This trend is not a guess; multiple signals in reality point to this. Naveen Rao, Vice President of Generative AI at Databricks, has also made a clear prediction:

In the next two to three years, all closed-source AI model providers will stop selling API services.

In simple terms, the API economy is about to come to an end. The original honeymoon period between model providers and the application layer (Wrapper) has completely ended.

Possible changes in market direction:

Claude Code and DeepSearch are early explorations of this trend in technology and products. You may have noticed that DeepSearch does not provide an API interface and only appears as a value-added feature of OpenAI's premium subscription; Claude Code is merely a very simple terminal integration. This clearly indicates that model vendors have begun to bypass third-party application layers and directly create user value.
Application layer companies are secretly laying out model training capabilities. Currently successful application companies have also realized this threat and are quietly attempting to transform. For example, Cursor has developed a small code completion model independently; WindSurf has internally developed a low-cost code model called Codium; Perplexity has previously relied on internal classifiers for request routing and has recently transformed to train its own DeepSeek variant model for search purposes.
Currently successful "application wrappers" are actually in a dilemma: they either train models independently or wait to be completely replaced by upstream large models. What they are doing now is essentially conducting free market research, data design, and data generation for upstream large model vendors.

What happens next is still uncertain. Successful application wrappers are now in a dilemma: "train models themselves" or "be trained by others." As far as I know, investors are currently extremely averse to "training models," even forcing some companies to hide their most valuable training capabilities, such as Cursor's small model and Codium's documentation, which remain very limited to this day.

The Market Has Completely Failed to Account for the Potential of Reinforcement Learning (RL)

Currently, there is a common problem in the AI investment field: almost all investments are highly correlated.

At this stage, almost all AI investment institutions share the following consistent thoughts:

Closed AI vendors will provide APIs in the long term;
The application layer is the best way to monetize AI;
Training any form of model (whether pre-trained or reinforcement learning) is a waste of resources;
All industries (including heavily regulated fields) will continue to rely on external AI providers in the long term.

However, I must say that these judgments increasingly seem overly risky, even indicative of a clear market failure.

Especially in light of the recent breakthroughs in reinforcement learning (RL) technology, the market has failed to correctly price the immense potential of reinforcement learning.

At present, the power of "reinforcement learning" has not been accurately assessed and reflected by the capital markets.

From an economic perspective, in the context of the global economy gradually entering a recession, companies capable of model training have enormous disruptive potential. However, it is strange that model training companies are unable to secure investments smoothly. Take the emerging Western AI training company Prime Intellect as an example; it has clear technological strength and the potential to develop into a top AI laboratory, yet even so, its financing still faces significant difficulties.

Across Europe and the United States, there are very few emerging AI companies that truly possess training capabilities:

Prime Intellect, EleutherAI, Jina, Nous, the HuggingFace training team (which is very small), Allen AI, and a few academic institutions, along with some contributors to open-source infrastructure, basically cover the entire construction and support work of Western training infrastructure.

In Europe, to my knowledge, there are at least 7-8 LLM projects currently using Common Corpus for model training.

However, capital remains indifferent to these teams that can truly train models.

"Training" has become an overlooked value pit.

Recently, even within OpenAI, there has been clear dissatisfaction with the current lack of "Vertical Reinforcement Learning" in the Silicon Valley startup ecosystem.

I believe this information comes directly from Sam Altman himself, and it may soon be reflected in the new batch of incubated projects at YC.

The signal behind this is very clear: large companies will tend to collaborate directly with startups that possess vertical reinforcement learning capabilities, rather than merely relying on application layer wrappers.

This trend also hints at another larger change:

Many of the most profitable AI application scenarios in the future (such as traditional industries still dominated by rule systems) have yet to be fully developed. Whoever can train specialized models specifically for these fields will gain a significant advantage. Small teams that are cross-domain and highly focused may be better suited to tackle these challenges first and ultimately become potential acquisition targets for large laboratories However, what is concerning is that most Western AI companies are still stuck in a "pure application layer" competitive model. Many people are not even aware that:

The era of winning a war solely on the application layer has ended.

In contrast, China's DeepSeek has gone further: it no longer views models merely as products but as a form of universal infrastructure. As DeepSeek founder Liang Wenfeng clearly pointed out in a public interview:

"Like OpenAI and Anthropic, we plan to make it clear: DeepSeek's mission is not just to create a single product, but to provide an infrastructure-level capability... We will first invest in research and training, making it our core competitiveness."

Unfortunately, in Europe and the United States, the vast majority of AI startups still focus solely on building simple application layer products, akin to "using generals from past wars to fight a new war," without even realizing that the last war has actually ended.

The "Bitter Lessons" of Simple LLM Agents

The recently hyped Manus AI is a typical "workflow." My testing over the entire weekend has continuously validated the fundamental limitations of such systems, which had already emerged during the AutoGPT era. This limitation is particularly evident in search tasks:

They lack true planning capabilities and often "get stuck" halfway through a task, unable to proceed;
They cannot effectively remember long-term context, usually struggling to maintain coherence in tasks lasting over 5 to 10 minutes;
They perform poorly in long-term tasks, where slight errors in multiple steps can be magnified, leading to eventual failure.

Today, we attempt to redefine the concept of LLM agents from this new, stricter perspective. The following content is a summary made as clearly as possible after integrating limited information from large companies, recent achievements in open research, and some of my personal speculations.

The concept of an agent is fundamentally almost in conflict with basic large language models.

In traditional agent research, an agent is always in a constrained environment: imagine being trapped in a maze where you can move left or right, but you cannot fly freely, suddenly burrow underground, or disappear into thin air—you are strictly limited by physical rules or even game rules. A true agent, even in such a constrained environment, will have some degree of freedom because there are multiple ways to complete the game. However, no matter how you act, each decision must be backed by a clear goal: to win the ultimate reward. An effective agent will gradually remember the paths it has taken, forming effective patterns or experiences.

This exploratory process is called "search." This term is quite fitting: the exploratory behavior of an agent in a maze is almost a perfect analogy to a human user clicking links during a web search to explore the information they want Research on "search" has a history of several decades in academia. A recent example is the Q-star algorithm (rumored to be the algorithm behind OpenAI's next-generation model, though it has not been fully confirmed to date), which actually originates from the A-Star search algorithm developed in 1968. The recent Pokémon training experiment completed by PufferLib vividly demonstrates the entire process of how such agents "search": we see the agents continuously trying paths, retrying after failures, and repeatedly exploring to find the optimal path.

The operational methods of foundational language models and agents are almost entirely opposite:

Agents remember their environments, while foundational language models do not. Language models respond only based on the information within the current window.
Agents have clear rational constraints, limited by actual conditions, while foundational language models merely generate text with higher probabilities. Although they can sometimes exhibit consistent logic, they can never guarantee it and may deviate at any moment due to "aesthetic demands."
Agents can formulate long-term strategies; they can plan future actions or backtrack. However, language models excel only at single reasoning tasks and quickly become "saturated" (multi-hop reasoning) when faced with problems requiring complex multi-step reasoning, making them difficult to handle. Overall, they are constrained by textual rules rather than the physical or game rules of the real world.

The simplest way to combine language models with agents is to constrain outputs through predefined prompts and rules. Currently, the vast majority of language model agent systems operate this way; however, this approach is destined to encounter Richard Sutton's "Bitter Lesson."

People often misunderstand the "Bitter Lesson," thinking it serves as a guide for pre-training language models. In essence, it discusses the design of agents, highlighting our tendency to directly "hard code" human knowledge into agents—such as "if you hit a wall, change direction; if you hit a wall multiple times, try going back." This method appears effective in the short term, yielding quick progress without requiring long training times. However, in the long run, this approach often leads to suboptimal solutions and can even get stuck in unexpected scenarios.

Sutton summarizes it this way:

"We must learn the bitter lesson: artificially pre-setting the way we think does not work in the long run. The history of AI research has repeatedly verified:

Researchers often try to write knowledge into agents in advance;

This approach shows obvious short-term effects and gives researchers a sense of accomplishment;

But in the long run, performance quickly reaches its limits and even hinders subsequent development."

The ultimate breakthrough actually comes from a completely opposite approach, which is to search and learn through a large amount of computational resources. The eventual success is somewhat bittersweet because it negates the human-centered approach that people prefer."

Let’s transfer this reasoning to the current production applications of LLMs. Tools like Manus or common LLM encapsulation tools are engaged in the work of "human-set knowledge," guiding the model with pre-designed prompts. This may be the easiest short-term solution—you don’t even need to retrain the model—but it is by no means the optimal choice. Ultimately, what you create is a hybrid, partly relying on generative AI and partly on rule systems, and these rules are precisely the simplified abstractions of concepts such as space, objects, multi-agent systems, or symmetry in human thinking.

To put it more bluntly, if Manus AI still cannot book flights well or provide useful advice when fighting a tiger, it is not because it is poorly designed, but because it has encountered the backlash of a "bittersweet lesson." Prompts cannot be infinitely expanded, and hard-coded rules cannot be infinitely expanded. What you truly need is to fundamentally design a real LLM intelligent agent that can search, plan, and act.

Reinforcement Learning (RL) + Reasoning: The True Path to Success

This is a difficult problem. There is currently very little publicly available information, and only a few laboratories like Anthropic, OpenAI, and DeepMind understand the details. So far, we can only glean some basic information from limited official announcements, informal rumors, and a small amount of public research:

Similar to traditional agents, LLM intelligent agents are also trained using reinforcement learning. You can think of the learning of a language model as a "maze": the paths in the maze represent all possible combinations of text that could be written about a certain topic, and the exit of the maze is the "reward" that you ultimately want. The process of determining whether the reward has been reached is called the "verifier." William Brown's new open-source library Verifier is specifically designed for this purpose. Current verifiers tend to validate explicit results such as mathematical formulas or code. However, as Kalomaze has demonstrated, even for results that are not strictly validated, effective verifiers can be constructed by training specialized classifiers. This is due to an important characteristic of language models: their ability to evaluate answers far exceeds their ability to generate answers. Even using a smaller language model as a "judge" can significantly improve the overall performance and design effectiveness of the reward mechanism.
The training of LLM intelligent agents is completed through "drafts," meaning that the entire text is generated and then evaluated. This method was not determined from the outset; initial research tended to search for each individual token. However, due to limited computational resources and recent breakthroughs in reasoning models, "draft-style" reasoning has gradually become the mainstream training method. A typical reasoning model training process involves allowing the model to autonomously generate multiple logical steps and ultimately selecting those drafts that yield the best answers This may lead to some unexpected phenomena, such as DeepSeek's R0 model occasionally switching between English and Chinese. However, reinforcement learning does not care whether it looks strange; it only cares about whether the results are optimal. Just like an agent lost in a maze, language models must find their way out through pure reasoning. There are no pre-defined prompts, no predetermined routes, only rewards and ways to obtain those rewards. This is precisely the bitter solution provided by bitter lessons.
Drafts of LLMs are usually pre-divided into structured data segments to facilitate reward verification and, to some extent, assist the overall reasoning process of the model. This practice is called "rubric engineering," which can be achieved directly through reward functions or, more commonly in large laboratories, completed through an initial post-training phase.
LLM agents typically require a large amount of draft data and multi-stage training. For example, when training for search tasks, we do not evaluate search results all at once; instead, we assess the model's ability to acquire resources, generate intermediate results, acquire new resources again, continue to advance, change plans, or backtrack, etc. Therefore, the most favored method for training LLM agents now is the GRPO proposed by DeepSeek, especially when used in conjunction with the vllm text generation library. A few weeks ago, I also released a very popular code notebook based on William Brown's research, successfully implementing the GRPO algorithm using just a single A100 GPU provided by Google Colab. This significant reduction in computational resource requirements will undoubtedly accelerate the popularization of reinforcement learning and agent design in the coming years.

Wait a minute, how can this be scaled?

The content mentioned above consists of basic modules. From here, there is still a distance to cover to reach OpenAI's DeepResearch and the various emerging agents capable of handling a series of complex tasks. Allow me to elaborate a bit on this.

Currently, the open-source community's research on reinforcement learning (RL) and reasoning mainly focuses on the mathematical domain, as we find a lot of mathematical problem data online, such as some packaged into Common Crawl and then extracted by HuggingFace's classifiers (like FineMath). However, in many other areas, especially "search," we do not have readily available data. This is because what search requires is not static text, but real action sequences, such as clicks, query logs, and behavior patterns when users browse web pages.

I previously did some log analysis, and at that time, the model (despite still using older methods like Markov chains, although this field has rapidly developed in recent years) often trained on AOL search data leaked in the late 1990s! Recently, this field finally gained a key open-source dataset: Wikipedia clickstream data, which records the paths of anonymous users jumping from one Wikipedia article to another But let me ask you a simple question: Is this dataset available on HuggingFace? No. In fact, there is almost no truly "agentic" data on HuggingFace, meaning that this data can help models learn to plan actions. Currently, the entire field still defaults to using manually designed rule systems to "command" large language models (LLMs). I even suspect that even large companies like OpenAI or Anthropic may not have access to a sufficient quantity of such data. This is where traditional tech companies, especially those like Google, still hold a significant advantage—after all, you can't just buy the massive user search data accumulated by Google (unless some fragments of the data have leaked on the dark web).

However, there is a solution, which is to simulate generated data, also known as "simulation." Traditional reinforcement learning models do not require historical data; they explore and learn various rules and strategies in the environment through repeated attempts. If we apply this method to search tasks, it would be similar to RL training in the gaming field: allowing the model to explore freely and rewarding it when it finds the correct answer. However, in the search domain, this exploration can be very lengthy. For example, if you want to find a particularly obscure chemical experiment result, it might be hidden in an old Soviet paper from the 1960s, and the model can only rely on brute force searching and some linguistic adjustments, eventually stumbling upon the answer after numerous attempts. Then, the model tries to understand and summarize the patterns that can improve the likelihood of finding similar answers next time.

Let's calculate the cost of this method: taking a typical reinforcement learning method as an example, such as GRPO, you might have 16 concurrent exploration paths at once (I even suspect that the actual training concurrency in large labs is far more than 16). Each exploration path may continuously browse at least 100 web pages, which means that in a small training step, about 2,000 search requests need to be made. More complex reinforcement learning training often requires hundreds of thousands or even millions of steps, especially if you want the model to have general search capabilities. This means that a complete training session may require hundreds of millions of network requests, which might inadvertently DDoS some academic websites... In this case, your real bottleneck is no longer computational resources, but rather network bandwidth.

Reinforcement learning in the gaming field has also encountered similar issues, which is why the most advanced methods today (such as Pufferlib) repackage the environment to "look like an Atari game to the model." The essence hasn't changed; it's just that the data the model can see is highly standardized and optimized. When applying this method to search, we can directly utilize existing Common Crawl large-scale web data, disguising this data as real-time web pages returned to the model, including URLs, API calls, and various HTTP requests, making the model mistakenly believe it is genuinely accessing the internet, while in fact, all the data has been prepared in advance and can be queried directly from a local high-speed database So, I estimate that in the future, training a searchable LLM reinforcement learning agent might be done in the following way:

First, create a large simulated search environment where the dataset is fixed, but during training, it is continuously "translated" into a web page format that the model can understand and feedback to the model.
Before the formal reinforcement learning training, use some lightweight supervised fine-tuning (SFT) to "preheat" the model (similar to the SFT-RL-SFT-RL training route of DeepSeek), possibly using some existing search pattern data. The goal is to familiarize the model with the logic of search thinking and output format in advance, thereby accelerating the subsequent RL training. This is similar to a pre-set training "template."
Next, prepare a set of complex query problems of varying difficulty, along with corresponding clear verification standards (verifier). The specific operation may involve building a complex synthetic data pipeline to reverse-engineer these standards from existing resources, or simply hiring a group of PhD-level experts to manually label them (which is very costly).
Then comes the real multi-step reinforcement learning training. When the model receives a query, it will actively initiate a search, and after obtaining results, it can further browse web pages or adjust search keywords. This process is divided into multiple continuous steps. From the model's perspective, it feels like browsing the internet in real-time, while in reality, all the data exchanges behind the scenes are completed by a pre-prepared search simulator.
Once the model is proficient enough at searching, there may be another round of reinforcement learning (RL) and supervised fine-tuning (SFT), but this time the focus shifts to "how to write high-quality final summaries." This step will likely also utilize a complex synthetic data pipeline, allowing the model to break down previously output long content into smaller segments and then reassemble them through some reasoning to enhance the quality and logical coherence of its generated results.

A true agent does not rely on "prompts" to work

Finally, we truly have an "agent" model. So compared to the original workflow or model orchestration, what changes does it bring? Does it simply improve quality, or does it signify a completely new paradigm?

Let's first review Anthropic's definition of an agent: "A large language model (LLM) agent can dynamically and autonomously direct its actions and tool usage, while always maintaining control over how to complete tasks." To understand this more intuitively, let me give an example from a scenario I'm familiar with: search.

There was widespread speculation in the industry that as large language models gained longer context windows, traditional "retrieval-augmented generation" (RAG) methods would gradually fade away. However, the reality is not so. There are several reasons: the computational cost of ultra-long contexts is too high, accuracy is insufficient beyond simple information queries, and it is difficult to trace the source of inputs. Therefore, true "agent search" will not completely replace RAG. What is more likely to happen is that it will be highly automated, helping us automatically integrate complex vector databases, routing choices, sorting optimizations, and other processes. A typical search process in the future might look like this:

After the user raises a question, the intelligent agent will analyze and break down the question, inferring the user's true intent.
If the question is vague, the intelligent agent will proactively ask the user questions to further confirm (OpenAI's DeepResearch has already been able to do this).
Then, the model may choose to conduct a general search or, depending on the situation, directly select specific professional data sources. Since the model remembers common API calling methods, it can directly call the corresponding interfaces. To save computational resources, the intelligent agent will prefer to utilize existing APIs, sitemaps, and structured data ecosystems available online.
The search process itself will be continuously learned and optimized by the model. The intelligent agent can autonomously judge and abandon incorrect search directions, and like an experienced professional, turn to try other more effective paths. Currently, some impressive results from OpenAI's DeepResearch demonstrate this capability: even if certain resources are not well indexed, it can find accurate resources through continuous internal reasoning.
Throughout the entire search process, every decision and reasoning step of the intelligent agent will leave a clear internal record, thus achieving a certain degree of interpretability.

In simple terms, the search process will be directly "engineered" by the intelligent agent. The intelligent agent does not require additional data preprocessing but instead flexibly adapts based on existing search infrastructure to find the best path. At the same time, users can efficiently interact with generative AI without needing specialized training. As Tim Berners-Lee emphasized more than a decade ago: "A true intelligent agent is one that can automatically accomplish what the user has in mind but has not explicitly stated in each specific scenario."

Let’s apply this practical intelligent agent concept to other fields to see the actual effects: for example, a network engineering intelligent agent could directly interact with existing infrastructure, automatically generate configuration plans for routers, switches, and firewalls, analyze network topology based on demand, provide optimization suggestions, or automatically parse error logs to locate the root cause of network issues.

In the financial sector, intelligent agents in the future could automatically and accurately achieve conversions between different financial data standards, such as translating from ISO 20022 to MT103 standards. These capabilities are currently unattainable through simple system prompts.

However, currently, only a few major laboratories can truly develop such intelligent agents. They hold all the key resources: proprietary technology, some critical data (or synthetic techniques to generate this data), and a strategic vision for turning models into products. This high concentration of technology may not be a good thing, but to some extent, it can also be attributed to the capital market's underestimation of the long-term value of model training, which has limited innovation and development in this field.

I usually do not like to overhype certain new concepts, but the enormous disruptive potential and commercial value behind intelligent agents make me firmly believe that we urgently need to democratize the training and deployment of practical intelligent agents: publicly validating models, providing training data samples for GRPO (Goal-Oriented Reward Policy Optimization), and in the near future, publicly sharing complex synthetic data pipelines and simulators as well as other infrastructures Will 2025 be the year of the rise of intelligent agents? Perhaps there is still a chance, we shall wait and see.

Author of this article: Alexander Doria, Source: Information Equality, Original title: "The Real LLM Agent".

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investment based on this is at one's own risk