Latest Global Model Rankings: Alibaba Qwen2.5-Max surpasses DeepSeek V3

In the latest global large model ranking, Alibaba's latest large language model Qwen2.5-Max ranks 7th, ahead of top proprietary large language models such as DeepSeek V3, O1-Mini, and Claude-3.5-Sonnet. It ranks first in mathematics and programming, and second in handling complex tasks with hard prompts

The competition is heating up! The latest global large model ranking has been released, and Alibaba's new model surpasses DeepSeek V3.

On February 4th local time, the globally renowned AI model evaluation platform Chatbot Arena announced the latest ranking. Among them, Alibaba's Tongyi Qianwen team's latest large language model Qwen2.5-Max achieved excellent results, ranking 7th overall, ahead of top proprietary large language models such as DeepSeek V3, O1-Mini, and Claude-3.5-Sonnet.

Looking at the scoring items, Qwen2.5-Max performed particularly well in the technical field, ranking first in mathematics and programming, and second in handling complex tasks with hard prompts.

In the past year, Alibaba has continuously expanded the Qwen model family, launching various models covering text, audio, and visual formats to meet the growing AI demands of global developers and customers.

In the early hours of January 29th, Alibaba's Tongyi Qianwen team quietly launched Qwen2.5-Max. Once released, this model quickly achieved leading results in major benchmark tests such as MMLU-Pro, LiveCodeBench, LiveBench, and Arena-Hard, demonstrating performance on par with the world's top models.

According to reports, the latest Qwen2.5-Max adopts an advanced mixture of experts (MoE) model architecture, with pre-training data exceeding 20 trillion tokens, optimized using supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) techniques, excelling in knowledge, programming, general capabilities, and human alignment.

Currently, global developers and enterprises can access Qwen2.5-Max through Alibaba Cloud's generative AI development platform Model Studio.

Market analysts previously stated that there has been an excessive focus on DeepSeek, neglecting the overall catch-up of Chinese AI, including Alibaba Tongyi. Industry media "Information Equality" stated, if Alibaba's Qwen-2.5-Max indeed outperforms V3 this time, greater expectations can be placed on its RL reasoning model.

After the release of Qwen2.5-Max, it quickly attracted significant attention from users and developers both domestically and internationally Some netizens vividly summarized that Qwen2.5-Max is equivalent to the "Chinese version of ChatGPT," but its level is "much higher" than the latter.

Some users stated that Qwen2.5-Max has "redefined" the video generation function, already surpassing OpenAI's Sora.

Some users even believe that Qwen2.5-Max has severely "slapped" ChatGPT and DeepSeek on the beach.

Other netizens created meme images, speculating that after DeepSeek-R1, this powerful AI model from China will further intensify OpenAI's concerns.