
A week of tariff news flooding the screens, the AI circle is also "stirring": Llama 4 is here, O3 and O4-mini are also on the way, and DeepSeek R2 and GPT-5 are not far off?

Meta released the Llama 4 series, emphasizing multimodal capabilities and ultra-long context windows, with some models to be open-sourced. OpenAI confirmed that O3 and O4-mini are about to launch, while the release of GPT-5 has been postponed but will be free. DeepSeek and Tsinghua University published a new paper proposing the SPCT method and meta-reward model, significantly enhancing reasoning expansion performance
Author: Bao Yilong
Source: Hard AI
This week, global headlines were dominated by tariff issues, but the tech industry's focus was on the intensive actions in the AI field.
Over the weekend, Meta unexpectedly released the Llama 4 series late at night, claiming "native multimodal + tens of millions of context windows," and for the first time disclosed a lightweight version that can run on a single H100. Previously, OpenAI announced that the O3 and O4-mini models would be launched in a few weeks, while confirming that GPT-5 has been delayed for several months due to technical integration and computing power deployment issues.
DeepSeek, in collaboration with a research team from Tsinghua University, released a new paper this week on Scaling during inference, proposing a learning method called Self-Principle Commentary Tuning (SPCT) and constructing the DeepSeek-GRM series models. By combining meta-reward models to achieve scaling during inference, performance approaches that of the 671B large model, suggesting that DeepSeek R2 is nearing completion.
Meta Strongly Launches Llama 4, with Multimodal and Ultra-Long Context as Highlights
On Saturday, Meta officially released the Llama 4 series models. The entire Llama 4 series adopts a mixture of experts (MoE) architecture and achieves native multimodal training, completely bidding farewell to the era of Llama 3's pure text models. The released models include:
Llama 4 Scout (17B active parameters, 109B total parameters, supports 10 million+ Token context window, can run on a single H100 GPU);
Llama 4 Maverick (17B active parameters, 400B total parameters, context window 1 million+, performance surpasses GPT-4o and Gemini 2.0 Flash);
and the powerful Llama 4 Behemoth preview (288B active parameters, 2 trillion total parameters, trained using 32,000 GPUs and 30 trillion multimodal Tokens).
The newly announced Llama 4 Maverick and Llama 4 Scout will be open-source software. However, the new license for Llama 4 imposes certain restrictions on usage, such as companies with over 700 million monthly active users needing to apply for special permission, and compliance with multiple branding and attribution requirements during use.
Jeremy Howard, former president of Kaggle and founder of fast AI, stated that while he appreciates open-source, both Llama 4 Scout and Maverick are large MoE models that cannot run on consumer-grade GPUs even after quantization, which is a significant loss for the accessibility of the open-source community

Meta emphasizes that Llama 4 Scout and Llama 4 Maverick are its "most advanced models to date" and the "best versions in terms of multimodality in their class."
- Scout Highlights: Extremely fast, natively supports multimodality, has an industry-leading 10 million+ token multimodal context window (equivalent to processing over 20 hours of video!), and can run on a single H100 GPU (after Int4 quantization).
- Maverick Performance: Outperformed GPT-4o and Gemini 2.0 Flash in multiple mainstream benchmarks, with reasoning and coding capabilities comparable to the newly released DeepSeek v3, but with less than half the activation parameters of the latter.
Users on X are also amazed by the performance of the Scout model, especially its ability to run on a single GPU and support ultra-long context windows.

The most eye-catching is Llama 4 Behemoth. Currently, Behemoth is still in training, but Meta positions it as "one of the smartest LLMs in the world." This "behemoth," with 288 billion activation parameters and a total of 2 trillion parameters, has been trained on 30 trillion multimodal tokens across 32,000 GPUs, showcasing Meta's strong capabilities in the AI field.
Some users on X pointed out the performance potential of Behemoth's training, emphasizing that it has already shown the ability to surpass several top models, such as Claude 3.7 and Gemini 2.0 Pro, at this stage.

Other users on X mocked Meta's "burning money" strategy while expressing surprise at the parameter scale of Llama 4.

Previously, The Information reported on Friday that, under pressure from investors to demonstrate returns on investment, Meta plans to invest up to $65 billion this year to expand its AI infrastructure
OpenAI Confirms O3 and O4-mini Launching Soon, GPT-5 Free Strategy Creates a Stir
Alongside the release of Llama 4, OpenAI CEO Sam Altman confirmed on social media that O3 and O4-mini will be launched in the coming weeks, while GPT-5 will be available to the public in the next few months.

Although there are no further details about O3 and O4-mini, Altman stated that OpenAI has made significant improvements to the O3 model in many aspects, which will surely satisfy users.

In fact, the capabilities and release timing of GPT-5 are the main focus of the market. According to Altman, GPT-5 will integrate multiple features such as voice, Canvas, search, and Deep Research, becoming the core of OpenAI's unified model strategy.
This means that GPT-5 will no longer be a single model, but a comprehensive system that integrates various tools and functions. Through this integration, GPT-5 will be able to autonomously use tools, determine when to think deeply and when to respond quickly, thus handling various complex tasks. OpenAI's initiative aims to simplify the internal model and product system, allowing AI to truly achieve convenience on demand.
Even more exciting is that GPT-5 will offer unlimited usage rights to free users, while paid users will experience a higher intelligence version. Previously, Altman mentioned in a deep conversation with Silicon Valley analyst Ben Thompson that due to the influence of DeepSeek, GPT-5 will consider allowing users to use it for free.
However, due to the repeated delays in the release of GPT-5, some netizens have created the following timeline to poke fun.
DeepSeek Partners with Tsinghua to Release New Paper
DeepSeek and the research team from Tsinghua University jointly released a new paper on Scaling during reasoning this week, proposing a learning method called Self-Principled Critique Tuning (SPCT) and constructing the DeepSeek-GRM series models. **This method dynamically generates evaluation principles and critique content through online reinforcement learning (RL), significantly enhancing the scalability of general reward modeling (RM) during the reasoning phase, and introduces a meta reward model (meta RM) Further optimize expansion performance.

The core of the SPCT method lies in transforming "principles" from a traditional understanding process into a part of reward generation, enabling the model to dynamically generate high-quality principles and comments based on the input questions and their answers. This method includes two stages:
- Rejective fine-tuning as a cold start phase, helping the model adapt to different input types;
- Rule-based online reinforcement learning further optimizes the generated content, enhancing reward quality and reasoning scalability.
To optimize the voting process, the research team introduced a meta reward model. This model filters out low-quality samples by judging the correctness of the generated principles and comments, thereby improving the accuracy and reliability of the final output.

Experimental results show that DeepSeek-GRM-27B significantly outperforms existing methods and models in multiple RM benchmark tests, especially demonstrating excellent performance in reasoning scalability. By increasing reasoning computational resources, DeepSeek-GRM-27B has shown strong potential for performance improvement, proving the advantages of reasoning phase expansion strategies.
This achievement not only advances the development of general reward modeling but also provides a new technical path for the application of AI models in complex tasks, and it may even be showcased in DeepSeek R2.
Some overseas forum users jokingly remarked that DeepSeek has consistently followed the rhythm of being a "late paper model," which may put pressure on competitor Llama-4.


