Tencent has added fuel to the 3D generative large model

Hunyuan can create mini-games with 3D animation

Author | Huang Yu, Chen Yingyi

Editor | Zhou Zhiyu

In the past year, the capabilities of foundational large language models have generally improved, and text-to-video large models have emerged, moving towards AGI. By 2025, more mature multimodal large models will enter the market.

The competition in multimodal technology is intensifying. Tencent, which firmly holds the top position in the global gaming industry, is targeting the high demand for 3D generation in game development.

On January 21, Tencent officially launched and open-sourced the 2.0 version of the Hunyuan 3D generation large model, simultaneously launching the industry's first one-stop, low-threshold AI content creation platform for 3D—Hunyuan AI 3D Creation Engine.

At the communication conference, Guo Chunchao, head of Tencent Hunyuan 3D, stated that the value of the Hunyuan AI 3D Creation Engine lies in solving the problem of high demand for 3D creation, but ordinary people cannot do it, and professionals do it very slowly. "This also leads to the high cost of traditional 3D model creation, with the cheapest 3D model costing nearly a hundred yuan, and the expensive ones reaching 100,000."

As early as November last year, Tencent released and open-sourced the 1.0 version of the Hunyuan 3D generation large model, supporting enterprises and developers for fine-tuning and deployment.

Just two months later, Tencent made another significant move, indicating that Tencent is focusing on the AI 3D field and aims to continue making efforts to become an industry pioneer.

Compared to version 1.0, the 2.0 version of the Hunyuan 3D generation large model still supports both text and image-to-3D capabilities. The difference lies in the significant improvement in generation effects by decoupling geometry and texture generation, resulting in more refined geometric structures and richer texture colors.

It is reported that the 3D generation model mainly includes two parts: geometry and texture generation. The geometric large model focuses on capturing the shape, structure, and spatial relationships of objects, while the texture large model focuses on color, detail, and surface features. This specialization allows each model to conduct deeper learning and optimization in its field, and the decoupled generation of geometry and texture elevates the overall generation capability to a higher limit, enabling the generation of more refined and realistic 3D results.

Equipped with the 2.0 version of the 3D AI creation engine, users can generate 3D models directly through a sentence, prompt, or image. In addition to basic model generation, the engine also offers various functions such as a 3D function matrix, 3D editing, 3D generation workflow, and a creative material library.

In other words, this 3D content AI creation platform features "low threshold and high efficiency," supporting 3D production pipelines in professional fields such as game development and design modeling, while also enabling ordinary enthusiasts to generate UGC 3D content.

Unlike many large model vendors, Tencent has always regarded "industrial practicality" as the core strategy for developing large models, and Tencent itself has a wealth of business scenarios to practice.

The Tencent Hunyuan large model has already been implemented in over 700 business scenarios within Tencent, and the Hunyuan 3D generation large model has been internally tested in various business scenarios such as gaming, social networking, Tencent Maps, Tencent Cloud, and robotics It is reported that the quality of Hunyuan 3D generation has met the standards for some game 3D assets, including geometric wiring rationality, texture accuracy, and skeletal skinning rationality. According to statistics, with the help of the Hunyuan 3D creation platform, the production time cost of Tencent's game business 3D assets can be reduced from 5-10 days to a matter of minutes.

Game development is undoubtedly an important scenario for the application of 3D models, and the Hunyuan 3D generation large model version 2.0 allows 3D generation to be truly applied in game development.

Wang Zhigang, the producer of Tencent Games' research and development projects, pointed out at the exchange meeting that the main challenges of AI-generated 3D models applied in game development include model polygon count control, wiring rationality, binding skeleton capability, and skinning rationality. This has led to the fact that the vast majority of AI-generated 3D models cannot be applied in games.

The Hunyuan 3D generation large model has shown significant improvements in these areas. Regarding polygon count control capability, Wang Zhigang stated that polygon count control is crucial for game development. Some game projects may require 3D models with only a few thousand polygons, while some large models can only generate tens of thousands of polygons, which can cause the game to not run.

"People feel that if they can't create higher precision models, it seems that the model's capability is not strong enough. But on the other hand, if in game development, they can't create models with lower polygon counts, it actually represents that their capability is not sufficient."

Wang Zhigang mentioned that the Tencent Hunyuan team can deeply understand the entire production process of game development, so they can focus on the needs of game development while understanding the real pain points. Other teams may not have such a deep understanding or focus, leading to a disconnect with business applications.

In Wang Zhigang's view, the Hunyuan 3D generation large model can basically meet the 3D generation needs of mini-games on WeChat.

Multimodality is the focus of the next phase of the arms race in the large model field, and there is still significant room for improvement.

Guo Chunchao pointed out that from a technical perspective, the maturity of 3D and video has not reached a sufficient turning point, as the development time is relatively short. However, the visual qualification rate has rapidly increased from 20% to 60% in just one year. But compared to the 95% qualification rate for text generation and over 90% for image generation, its maturity and usability are still in the early stages.

Regarding the future development direction of Tencent Hunyuan 3D, Guo Chunchao stated that the upper limit of version 2.0 has not yet been reached, so "the direction of technology must be to dig deep vertically and expand horizontally." However, the form a year from now is difficult to predict, just as no one previously expected the sudden emergence of Sora and GPT-4o. Therefore, in terms of technology, there may be a turning point triggered by quantitative changes.

The development of 3D generation large models still faces considerable challenges.

Guo Chunchao pointed out that first, there is a lack of data, with only tens of millions of data points available, and they have not been fully utilized. Second, 3D models themselves have relatively fewer constraints compared to other modalities. For example, while video also expands horizontally along the time axis, there are rarely abrupt changes, so the technical challenges for the models themselves are quite significant.

Multimodality is an inevitable trend. KaiYuan Securities pointed out that the continuous breakthroughs and subsequent commercialization of AI multimodal large models at home and abroad may significantly reduce the production costs of advertisements, courseware, short dramas, animations, series, and movies, improve IP development, advertising marketing, and teaching efficiency, and expand commercialization space The battlefield for 3D generative large models is already very hot. Recently, Fei-Fei Li's startup World Labs showcased an AI system that generates 3D worlds from a single image. ByteDance and Meituan have recently joined forces to invest in the 3D generative large model company Yingmou Technology.

It is foreseeable that 3D generative large models will become one of the hotspots in 2025