Research: AI Video Enters the "Production Line"

With the launch of Seedance 2.0, AI video is gradually integrating into film and television production processes, with applications emerging in areas such as short dramas and advertising. Creators are focusing on model stability and workflow reusability. Internet giants like ByteDance, Kuaishou, and Alibaba, along with numerous vertical players, have entered the fray, intensifying competition in the AI video sector and shifting evaluation standards

At the beginning of the year, the debut of Seedance 2.0 ignited the possibility of AI video participating in the industrialized processes of film and television production.

As scenarios such as short dramas, advertising, and e-commerce begin to incorporate AI video into actual production workflows, AI video models are transitioning from merely chasing benchmark scores to performing practical tasks. Creators are no longer concerned solely with model parameters and leaderboard performance; instead, they prioritize whether a model can consistently produce high-quality output, support the generation of continuous shots, and ultimately integrate into a reusable, collaborative, and deliverable workflow.

It is against this backdrop that ByteDance’s Seedance 2.0 has garnered attention.

"Unlike many models that require highly refined prompts, Seedance 2.0 can internally expand even short or abstract prompts into more professional and detailed descriptions, translating everyday language into camera language that the model can execute, thereby lowering the barrier for users," remarked a short drama practitioner in Xi’an to Wall Street News · All-Weather Tech.

Meanwhile, Kuaishou’s Kling and Alibaba’s HappyHorse continue to accelerate their iterations. Players such as iQIYI’s Nadou and Qunhe Technology’s LuxReal are entering the market through workflows, digital assets, 3D spaces, and collaboration tools. Vertical specialists like Shengshu Technology, Aishi Technology, MiniMax, and SenseTime are also vying for position.

With players across models, platforms, and toolchains all entering the field, the AI video sector is becoming a crowded and rapidly developing track.

The Failure of "Score Chasing"

From the vendor perspective, the competitive tiers are rapidly elongating.

Among internet giants, ByteDance has Seedance (Jimeng), Kuaishou has Kling, and Alibaba has HappyHorse.

Beyond mainstream internet companies, long-form video platform iQIYI has also entered the arena, launching "Nadou," a full-process AI creation platform tailored for professional short drama production.

Outside the giants, vertical players are flooding in densely: Shengshu Technology’s Vidu, Aishi Technology’s PixVerse (Paiwo AI), MiniMax’s Hailuo, Qunhe Technology’s LuxReal, and SenseTime’s Seko, among others, are all positioning themselves in this track.

However, on the flip side of this excitement, as AI video moves from model demonstrations to real production lines, external criteria for judging model capabilities are changing.

Over the past year, various leaderboards for AI video models have proliferated, with model rankings and sample comparisons abound. These leaderboards have, to some extent, amplified industry hype and allowed outsiders to intuitively see the capability differences between models.

The problem is that once video generation enters real production processes like short dramas, advertising, and content industrialization, models face more than just the question of "can it generate a good-looking sample." They must reliably generate footage with cinematic texture, smooth motion, and consistency in character subjects.

These capabilities are difficult to fully measure with an automated leaderboard.

Therefore, at this stage, many vendors have begun to downplay machine-based automatic video evaluation internally, placing greater emphasis on human assessment and feedback from real-world scenarios. For downstream creators, whether a model is truly useful is often not determined by its ranking on a leaderboard, but by its ability to reduce rework, improve production efficiency in continuous workflows, and genuinely integrate into industrialized processes.

To some extent, this mirrors the "failure of score chasing" already seen in the large model Agent sector.

When Agents first emerged, the industry was equally keen on using leaderboards to measure model capabilities. However, as Agents moved from conversation and demos to real workflows, it quickly became apparent that many leaderboard scores did not directly correlate with actual usability.

The reason is that once Agents enter the "working" phase, they often face multi-step, long-chain decision-making and execution processes, requiring them to understand goals, break down tasks, call tools, and continuously correct their path.

Existing evaluation systems struggle to comprehensively test capabilities in such long-task scenarios.

From this perspective, Seedance 2.0 has attracted attention precisely because it has begun to be embedded in real production flows.

From Usable to Production-Ready

According to visits by All-Weather Tech to multiple downstream application providers, the tangible changes brought by Seedance 2.0 are more direct.

"Whether it is understanding video content, grasping the laws of the physical world, or the naturalness of performance, Seedance 2.0 has shown significant improvement," said Liu Cheng, head of content at Kemeng Intelligence (Beijing) Technology Co., Ltd., an AI short drama production company, in an interview with All-Weather Tech.

Regarding the understanding of video content, Liu believes Seedance 2.0 has made considerable progress in interpreting abstract semantics.

"Although the final generated results still carry some uncertainty, the performance is already quite good. For example, if the prompt is 'let these two people have an ambiguous interaction in the scene,' the AI will analyze and generate lighting effects and color tones between the two characters that convey ambiguity, and the camera movement may become slower. Essentially, it automatically supplements these elements based on the request," Liu stated.

Moreover, he cited that issues such as glitches, clipping, and misaligned facial models, which frequently occurred in martial arts actions or complex multi-person interaction scenes, have now been basically resolved with Seedance 2.0.

"In some videos, you really can't tell whether it's AI or a real person," Liu said bluntly.

A short drama practitioner in Chongqing holds a similar view.

"Since the release of Seedance 2.0, the consistency of characters, lip-syncing, and voice has indeed improved compared to before. Moreover, the 'oil painting' look of the images has weakened significantly, and the storyboard design has become smarter," the practitioner told All-Weather Tech.

According to an AI short drama industry insider in Xi’an who spoke to All-Weather Tech, with the support of Seedance 2.0 and through prompt optimization, they can now generate a video clip of about 10 seconds in one or two attempts, achieving satisfactory results in at most three tries.

"If one is skilled, a 50-episode live-action AI short drama could be completed in about two weeks," the insider revealed.

Xing Xi (pseudonym), a developer currently starting a business focused on AI short drama tools, believes that Jimeng, which integrates ByteDance’s Seedance 2.0 model, offers better ease of use than other vendors.

According to Xing Xi, Jimeng’s comprehensive reference mode for video generation understands nine-grid storyboard images well. After uploading a keyframe image containing nine storyboard panels, the system can automatically infer and generate video based on the order marked in the storyboard. However, iteration speeds are fast across the board, and other tools now offer this feature as well.

At least in the current round of AI video competition, Seedance 2.0 has taken the lead in pushing model capabilities from "usable" to "closer to production-grade," increasing the pressure on latecomers to catch up.

What Are the Main Pain Points?

Although Seedance 2.0 stands out significantly, common issues in the AI video industry persist.

First, as the duration of generated videos lengthens, maintaining character consistency becomes difficult. Especially when a character turns from a frontal view to a profile, the facial features may change.

Currently, the basic solution adopted by vendors, including those behind Seedance 2.0, is to limit the duration of single video generations, typically keeping them between 5 and 15 seconds.

This forces users to generate videos segment by segment and then splice these clips into complete content through post-production editing.

However, segmented generation brings new problems: for each new shot, creators must re-input information such as character makeup references, clothing, scenes, and props into the model to maintain visual consistency as much as possible.

Academia is also exploring corresponding solutions.

For instance, the paper "Identity-Preserving Text-to-Video Generation by Frequency Decomposition," published by a team led by Yuan Shenghai, a master’s student in Computer Science at Peking University, aims to solve the problem of "how to maintain character subject consistency across different frames, actions, and angles during text-to-video generation."

For example, the technical framework ConsisID proposed by Yuan Shenghai in the paper primarily divides facial features into high-frequency and low-frequency signals, allowing the model to learn them separately to reduce learning difficulty.

"Previously, the common approach was to directly feed the original image into a feature extractor. We believe this actually increases the difficulty for the model to learn," Yuan explained. "We subsequently reviewed literature and found that facial features can actually be divided into high-frequency and low-frequency types. High-frequency signals correspond to details of the face, such as facial textures and eyes. Low-frequency signals relate to global facial features, including the relative positions of the facial skeleton, eyes, nose, and other features. If we can separate these high and low-frequency characteristics and let the model learn them individually, it makes it easier for the model to capture these features."

Second is the "layer separation" between characters and backgrounds.

Many viewers can intuitively feel that characters in AI-generated videos often appear to "float" above the background, as if they are not on the same layer.

Xing Xi analyzes that the root cause of the "AI look" in many images lies in the handling of lighting and depth. Since many creators transitioning to AI video lack training in film aesthetics and do not know how to actively adjust lighting effects, the resulting images lack depth.

"Some practitioners may not coordinate the handling of light angles, shadows, focus, and depth of field sufficiently, leading to a flat or disjointed appearance. Thus, many images look like two layers forcibly stitched together," Xing Xi pointed out. "Removing the 'AI look' largely depends on the producer’s foundation in film cinematography, simply put, their aesthetic understanding and presentation of relationships within the lens."

An AI video researcher also told All-Weather Tech that this is essentially a multimodal reference fusion problem on the model side. Character reference images and scene images have their own color tones and lighting sensations, making it difficult for the two to blend seamlessly.

Third is the logic of shots and emotional tension in long narratives.

Xing Xi believes that even with self-developed script generation and breakdown tools from major companies, scripts still suffer from being "flat and narrative-driven" and having "stiff, clichéd plots."

"Generalization capabilities for specific types and styles are insufficient, lacking ups and downs," Xing Xi noted. "Although villains may be set up in the broader plot, small-scale plots fail to evoke emotional resonance, lacking minor conflicts and logical rigor."

Liu Cheng also believes: "Although the upgrade of Seedance 2.0 has lowered the threshold for producing AI content, this will lead to a flood of uneven quality. Good works still require strong content power to truly move audiences."

Differentiated Positioning

Against this backdrop, players outside the major tech giants are beginning to establish differentiated advantages in areas such as workflows and case libraries.

According to Liu Cheng, Kemeng adopted AI-assisted functions during project generation. For example, the team developed storyboard prompt and sketch features. After users modify prompts, AI can complete 80% to 90% of the creation. Users who flexibly utilize AI prompts can further improve efficiency by fine-tuning the prompts.

Qunhe Technology has optimized workflows to the 3D level, launching the short drama version of LuxReal on May 27.

Based on Qunhe Technology’s self-developed spatial large models and other 3D technologies, LuxReal can generate 2D scene images into navigable virtual 3D spaces. Creators can freely adjust camera positions and set character standings, with the system automatically rendering corresponding images based on the same 3D scene.

However, the actual generation quality remains to be observed. For instance, although LuxReal’s settings for short drama workflows are relatively comprehensive, the level of proactive optimization needs improvement, with issues such as character clothing not matching the time period still present.

iQIYI’s Nadou integrates self-developed models and external models like Seedance 2.0, combining iQIYI’s IP library, digital asset library, and creator community to form callable platform capabilities, providing creators with one-stop support for the entire chain from content production to operations.

Among these, the IP library and digital asset library are iQIYI’s unique advantages. For example, in the digital asset library, creators can access IP images such as scenes, weapons, and animals from TV series like "Cheng He Ti Tong" (Imperial Palace) and "Hua Rong" (Demon Realm Encyclopedia).

However, observations by All-Weather Tech indicate that although iQIYI possesses a rich IP and digital asset library, the quantity currently presented on the Nadou platform remains relatively limited.

Overall, after introducing Seedance 2.0, players outside the major giants are mainly building their differentiated advantages in dimensions such as engineering, knowledge accumulation, and process collaboration.

Unceasing Competition

Whether it is long-video stability, character consistency, or controllability, the current AI video industry indeed faces many pain points that need to be resolved, and the competitive landscape is far from converging.

In this context, capitalization has become an important choice for some vendors to accelerate their efforts.

In May this year, rumors circulated that Kuaishou was accelerating the spin-off listing of Kling, planning to start an independent IPO next year, with a Pre-IPO valuation expected to reach $20 billion.

Subsequently, Kuaishou confirmed in an announcement to the Hong Kong Stock Exchange that its board of directors was evaluating plans to restructure Kling-related assets and businesses.

Coincidentally, vertical players are also speeding up financing and IPO preparations. After completing two rounds of financing totaling over 2.6 billion yuan within two months, Shengshu Technology is rumored to plan a Hong Kong IPO in the first half of 2026, with its corporate entity completing joint-stock reform by the end of March.

Intensive capital movements mean that the intensity of competition in this track will only further escalate, rather than converge.

Behind these capital moves lies another reality of the AI video sector: model competition is not just a technological race, but a comprehensive competition involving capital, computing power, data, and scenario implementation capabilities.

Meanwhile, the commercialization of AI video is still in its early stages. Although scenarios such as short dramas, advertising, e-commerce, gaming, and film pre-visualization have begun to validate demand, it will take time to form stable, scalable, and high-margin revenue models.

Precisely for this reason, financial support from capital markets has, to some extent, become an important chip for many vendors to remain in the game.

The competition in the current AI video sector has not ended due to Seedance 2.0’s phased lead. On the contrary, as more vendors secure funding and accelerate product iterations, the industry may continue to experience a race in model capabilities, production tools, and commercialization efficiency.

Risk Warning and Disclaimer

The market carries risks; investment requires caution. This article does not constitute personal investment advice, nor does it take into account the specific investment objectives, financial status, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment decisions made based on this content are the sole responsibility of the investor.