
Guosen Securities: ByteDance's new Douyin AI video model AI multimodal expected to usher in an explosive period

Guosen Securities released a research report stating that ByteDance's Volcano Engine launched two new AI video models, PixelDance and Seaweed, on September 24th in Shenzhen, targeting the enterprise market for beta testing. These models have made significant breakthroughs in semantic understanding, complex interactions, and content consistency, addressing the issue of insufficient coherence in AI videos. Tan Dai, President of Volcano Engine, stated that the price of large models is no longer an obstacle to innovation, and supporting larger concurrent traffic will be key to industry development
According to the Wise Finance APP, Guosen Securities released a research report stating that on September 24th, ByteDance's Volcano Engine held an AI innovation roadshow in Shenzhen, unveiling two large models and initiating invitation tests for the enterprise market. The new models have made significant breakthroughs in semantic understanding, complex interactions of multiple subjects in motion, and consistency in multi-camera switching, greatly improving the past issue of AI videos lacking coherence and realism. Previously, the pricing of the Dou Bao large model was announced to be lower than 99% of the industry, leading domestic large models to start a price reduction trend. Tan Dai, President of Volcano Engine, believes that the price of large models is no longer a barrier to innovation, and as enterprises apply them on a large scale, support for larger concurrent traffic by large models is becoming a key factor in industry development.
Dou Bao AI Video Model Newly Released
On September 24th, ByteDance's Volcano Engine held an AI innovation roadshow in Shenzhen, unveiling two large models, PixelDance and Seaweed, for video generation targeted at the enterprise market, initiating invitation tests.
The new models have made significant breakthroughs in semantic understanding, complex interactions of multiple subjects in motion, and consistency in multi-camera switching, greatly improving the past issue of AI videos lacking coherence and realism. Tan Dai, President of Volcano Engine, stated, "Video generation faces many challenges that need to be overcome. The two Dou Bao models will continue to evolve, exploring more possibilities in solving key issues, and accelerating the expansion of AI video creation space and application."
Three Major New Features — Continuous Action Performances
Continuous action performances: Solving the difficulty of past AI videos in depicting complex actions by characters
In the past, due to the difficulty in continuity, AI videos have always looked more like PowerPoint animations. Whether it's Sora, Runway, and other leading manufacturers, they only have the ability to show large camera angles in their demonstrations, unable to display complex actions by people. This new Dou Bao model brings a significant improvement in generating AI videos of character performances.
Multi-camera Combination Videos: Generate single videos with multiple camera angles using one image + prompt
According to Volcano Engine, the Dou Bao video generation model is based on the DiT architecture, utilizing efficient DiT fusion computing units to allow videos to freely switch between large dynamics and camera movements, possessing multi-camera language capabilities such as zooming, panning, tilting, zooming, and target tracking. The newly designed diffusion model training method has overcome the consistency challenge in multi-camera switching, maintaining consistency in subjects, styles, and atmospheres during camera transitions.
Ultimate Camera Control: Achieve various complex camera movements such as zooming, tilting, target tracking, and elevation control
Currently, AI videos are mainly focused on camera movement and motion brush functions, with limited capabilities in large camera movements and zooming.
The release of Dou Bao PixelDance has successfully achieved various complex camera movements such as 360-degree subject rotation, zooming, panning, tilting, target tracking, and elevation control, marking a significant improvement in AI video camera control capabilities User usage is growing rapidly, product capabilities are improving
As the product capabilities continue to improve, the usage of the DouBao large model is also growing rapidly. According to Volcano Engine, as of September, the daily token usage of the DouBao language model has exceeded 1.3 trillion, a tenfold increase compared to its initial release in May. The processing volume of multimodal data has also reached 50 million images and 850,000 hours of voice per day.
Previously, the DouBao large model was priced lower than 99% of the industry, leading the domestic large model market to start a price reduction trend. Tan Dai, President of Volcano Engine, believes that the price of large models is no longer a barrier to innovation. With large-scale enterprise applications, the ability of large models to support greater concurrent traffic is becoming a key factor in industry development.
According to Tan Dai, President of Volcano Engine, many large models in the industry currently only support up to 300K or even 100K TPM (tokens per minute), making it difficult to handle enterprise production environment traffic. For example, the peak TPM for a scientific research institution's document translation scenario is 360K, for a certain automotive intelligent cockpit it is 420K, and for a certain AI education company it even reaches 630K. Therefore, the DouBao large model defaults to supporting an initial TPM of 800K, far exceeding the industry average, and customers can also flexibly expand according to their needs.
Risk Warning
R&D falls short of expectations; Market demand falls short of expectations
