Caption: Tencent's Wenshengtu leader Lu Qinglin

AI multimodal large models continue to be popular, and Tencent is also making moves.

On May 14th, Tencent announced the comprehensive upgrade of its Huan Yuan Wenshengtu large model, which adopts the DiT architecture (Diffusion With Transformer) consistent with Sora, not only supporting Wenshengtu but also serving as the foundation for multimodal visual generation such as videos.

In Tencent's view, the DiT architecture is likely to become the mainstream visual generation architecture for the next generation. In the future, the DiT architecture is likely to become a unified architecture for multimodal visual generation such as Wenshengtu, live videos, and 3D generation.

Tencent has also open-sourced the Huan Yuan Wenshengtu large model, available for enterprises and individual developers for free commercial use.

This is the first Chinese-native open-source DiT architecture Wenshengtu model in the industry. It aims to fill the gap in the Wenshengtu open-source community with the DiT architecture, allowing more developers to participate and catch up with foreign advanced closed-source multimodal large models faster.

Tencent can also leverage the reconstruction of its business with large models in this process, empowering its existing business. According to the latest financial report, large models have had a positive impact on Tencent's business.

The parameter size of Tencent's upgraded Huan Yuan Wenshengtu large model this time is 1.5 billion, supporting bilingual input in Chinese and English, supporting image generation instructions with a maximum length of 256 characters (industry mainstream is 77), supporting user text rewriting, and multi-round drawing.

In the past few years, mainstream Wenshengtu models have mainly been based on diffusion models with U-Net architecture. However, U-Net models are prone to performance bottlenecks and face scalability issues. The DiT architecture mainly replaces parts of the U-Net architecture in the model. With sufficient computing power and data, the Transformer architecture can be infinitely expanded.

Models based on the Transformer architecture seem to have more potential to make Wenshengtu models smarter. Therefore, Huan Yuan Wenshengtu started R&D in July 2023, self-developed the entire process, and trained from scratch. Earlier this year, the Huan Yuan Wenshengtu large model was fully upgraded to the DiT architecture.

According to Wall Street News, combining Tencent's internal advertising and other real-world scenario optimization with architecture upgrades, the latest Tencent Huan Yuan Wenshengtu large model, compared to models based on U-Net architecture, has improved the overall visual generation effect by 20%. At the same time, there are significant improvements in multi-round dialogue, fine-grained semantic understanding, Chinese elements, and real portrait generation in specific scenarios.

Lu Qinglin, the head of Tencent's Wenshengtu, pointed out that compared to the three well-known closed-source Wenshengtu models Dalle3, SD3, and Midjorney, the Huan Yuan Wenshengtu large model ranks behind Dalle3 and Midjorney, but performs the best among all open-source Wenshengtu models Lu Qinglin further pointed out that before the open source of the Yuanwen Wenshengtu large model, the technology gap between open source and closed source Yuanwen Wenshengtu was gradually widening, and Tencent hopes to narrow this gap through this open source initiative.

Lu Qinglin also revealed that the launch of Sora by OpenAI earlier this year was because they had a strong DiT architecture model. The original intention of the open source of the Yuanwen Wenshengtu large model this time is to bring out the DiT architecture model, so that peers in the industry who want to create Wensheng videos can quickly expand this technology to videos, which can help save a lot of time for everyone.

Tencent has always been a supporter of open source technology, having open sourced over 170 projects in the past, all from real Tencent business scenarios, covering core business sectors such as WeChat, Tencent Cloud, Tencent Games, Tencent AI, Tencent Security, and more.

Lu Qinglin said, "Tencent's research and development approach for the Yuanwen Wenshengtu is practical, insisting on coming from practice and going back to practice. This time, by fully open sourcing the latest generation model, we hope to share Tencent's practical experience and research results in the field of Yuanwen Wenshengtu with the industry, enrich the open source ecosystem of Chinese Yuanwen Wenshengtu, jointly build the next generation of visual generation open source ecosystem, and accelerate the development of large models in the industry."

Multimodality is the trend. Opensource Securities pointed out that the continuous breakthroughs and subsequent commercialization of AI multimodal large models at home and abroad may significantly reduce production costs for advertisements, courseware, short films, animations, series, movies, etc., improve IP development, advertising marketing, and teaching efficiency, and expand commercialization space.

When the Yuanwen large model was released in September last year, Tencent emphasized practicality, calling the Yuanwen large model a practical-level large model that "comes from practice and goes back to practice." After the open source of the Yuanwen Wenshengtu large model, Tencent does not rule out the possibility of also open sourcing large language models.

Of course, whether open source or closed source is just a choice of different technological paths, and the ultimate goal is commercial application. According to Tencent's plan, Tencent's development of the Yuanwen large model will first serve Tencent itself, then plan for the industry through Tencent Cloud, and consumer applications are still in the exploration stage.

Lu Qinglin told Wall Street News that the commercial exploration of Yuanwen Wensheng videos is not urgent because Tencent's internal business scenarios are very rich and can already empower its own business well. As for the commercialization of consumer applications, there is no clear plan yet, but it is not ruled out.

At Tencent's first quarter earnings conference on the evening of May 14th, Tencent management also stated that Tencent is actively creating and testing different AI products to see which products are meaningful to the company's existing products. Over time, these products will be launched on platforms with a large number of users such as WeChat.

According to Tencent, currently over 400 Tencent business and application scenarios have been integrated into the internal testing of Tencent's Yuanwen large model. Tencent's Yuanwen Wenshengtu capabilities are widely used in material creation, product synthesis, game graphics, and many other business and scenarios.

The "arms race" of AI at home and abroad is in full swing, but this is a marathon. While waiting for the true "iPhone moment of AI" to arrive, Tencent has chosen a more solid path, making AI a "multiplier" for its own business. Facing this technological revolution, it will also be more composed