Tencent's new generation of fast thinking model is here! It can achieve "instant response," and deployment costs have significantly decreased

Tencent has released the new generation of fast thinking model Turbo S, which can achieve instant responses, significantly improving response speed and reducing deployment costs. This model combines fast thinking and slow thinking, enhancing performance in areas such as knowledge, mathematics, and reasoning, demonstrating competitiveness with industry-leading models. Turbo S adopts the Hybrid-Mamba-Transformer architecture, reducing computational complexity and addressing the high costs of training and reasoning with long texts, successfully applying the Mamba architecture for the first time in ultra-large MoE models

Tencent's Hongyuan New Generation Quick Thinking Model Turbo S is officially released.

Unlike slow thinking models like Deepseek R1 and Hongyuan T1 that require "thinking before answering," Hongyuan Turbo S can achieve "instant response," providing answers more quickly, with a doubling of output speed and a 44% reduction in initial response delay.

In areas such as knowledge, mathematics, and creativity, Hongyuan Turbo S also performs well.

Research shows that approximately 90%-95% of human daily decisions rely on intuition. Slow thinking resembles rational thinking, providing problem-solving ideas through logical breakdown; quick thinking, like human "intuition," offers rapid response capabilities in general scenarios for large models.

The combination and complement of quick thinking and slow thinking can enable large models to solve problems more intelligently and efficiently.

By integrating long and short thinking chains, Hongyuan Turbo S maintains the quick thinking experience for humanities questions while significantly improving scientific reasoning capabilities based on long thinking chain data synthesized from its self-developed Hongyuan T1 slow thinking model, resulting in a noticeable enhancement in overall model performance.

On multiple publicly available benchmarks widely used in the industry, Hongyuan Turbo S demonstrates performance comparable to a series of industry-leading models such as DeepSeek V3, GPT 4o, and Claude in knowledge, mathematics, reasoning, and other fields.

In terms of architecture, the innovative Hybrid-Mamba-Transformer fusion model effectively reduces the computational complexity of traditional Transformer structures, decreases KV-Cache memory usage, and lowers training and inference costs.

This new fusion model breaks through the high training and inference costs faced by traditional pure Transformer large models. On one hand, it leverages Mamba's efficient handling of long sequences; on the other hand, it retains the advantages of Transformers in capturing complex contexts, ultimately constructing a hybrid architecture that excels in both memory and computational efficiency.

This is also the first successful application of the Mamba architecture in ultra-large MoE models in the industry without loss.

Through technological innovation in model architecture, the deployment costs of Hongyuan Turbo S have significantly decreased, continuously lowering the application threshold for large models.

As the flagship model, Hongyuan Turbo S will become the core foundation for Tencent's Hongyuan series of derivative models in the future, providing foundational capabilities for reasoning, long text, code, and other derivative models Based on Turbo S, Hunyuan has also launched the reasoning model T1, which incorporates technologies such as long thinking chains, retrieval enhancement, and reinforcement learning.

This model has previously been fully launched on Tencent Yuanbao (the Tencent Hunyuan T1 model is open to all users), allowing users to choose between the Deepseek R1 or the Tencent Hunyuan T1 model for responses.

The official version of the Tencent Hunyuan T1 model will also be launched soon, providing external API access and other services.

Currently, developers and enterprise users can call Tencent Hunyuan Turbo S via API on Tencent Cloud, with a free trial available for one week starting today.

In terms of pricing, the input price for Turbo S is 0.8 yuan per million tokens, and the output price is 2 yuan per million tokens, which is several times lower than the previous generation Hunyuan Turbo model.

Tencent Yuanbao will gradually roll out the Hunyuan Turbo S, allowing users to select the "Hunyuan" model and turn off deep thinking for an experience.

Tencent Hunyuan, original title: "Tencent Hunyuan New Generation Fast Thinking Model Turbo S Released"

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial conditions, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at one's own risk