Google's version of Sora released: Surprisingly, the biggest competitor is "KeLing," and OpenAI performed the worst
Google has released the latest video generation model Veo 2 and image generation model Imagen 3, along with the image generation experimental tool Whisk. Tests show that OpenAI's Sora performed the worst, while Keling has become the top player in video generation. Veo 2 generates high-quality videos at resolutions of up to 4K, with a deep understanding of the physical laws of the real world, capable of faithfully following complex instructions, enhancing realism and fidelity, outperforming other AI video models
Google has just released its latest video generation model Veo 2 and image generation model Imagen 3, along with a brand new image generation experimental tool Whisk. Interestingly, according to Google's tests, OpenAI Sora has become the worst-performing mainstream video generation model, while 可灵 has emerged as one of the top video generation models, and 海螺 has also performed well.
Veo 2: Claimed to be the Most Advanced Video Generation Model
Veo 2 can generate high-quality videos on various themes and styles. In comparative tests with top models, judged by human evaluators, Veo 2 achieved state-of-the-art results. It has gained a deeper understanding of the subtlety of real-world physical laws as well as human movements and expressions, thereby enhancing the overall detail and realism of the videos.
Veo 2 understands the uniqueness of cinematic language. Users can specify types, shots, and cinematic effects, and Veo 2 can present effects at resolutions of up to 4K and durations of several minutes. Whether it's a low-angle tracking shot or a close-up of a scientist using a microscope, Veo 2 can easily achieve it. By using prompts like “18mm lens” or “shallow depth of field,” Veo 2 can generate wide-angle shots or effects that blur the background and highlight the subject, respectively.
The core advantages of Veo 2 include:
High Quality and Control: Able to faithfully follow both simple and complex instructions and realistically simulate the physical laws of the real world as well as various visual styles.
Enhanced Realism and Fidelity: Significantly superior to other AI video models in terms of detail, realism, and artifact reduction.
Advanced Motion Capabilities: Due to its understanding of physical laws and ability to execute detailed instructions, Veo can accurately present motion.
More Powerful Camera Control Options: Accurately interprets instructions to create various shot styles, angles, movements, and their combinations.
Veo 2 was evaluated by humans on the MovieGenBench benchmark dataset released by Meta, covering 1003 prompts and their corresponding videos. The results show that Veo 2 performed best in overall preference and ability to accurately follow prompts, compared to models like Sora, Meta's Movie Gen, 可灵, and 海螺。
From Google's test, we found that OpenAI's Sora performed relatively the worst among several mainstream video generation models. KuaLing is Google's biggest competitor, and in terms of overall preference and instruction adherence, when combining the tie and preference metrics, KuaLing is the only model among several video models that exceeds 50% preference choice over Veo, haha, KuaLing has received certification from Google.
Unlike traditional video models that often "fabricate" unnecessary details (such as extra fingers or unexpected objects), Veo 2 produces these issues less frequently, making its output more realistic.
Google has also improved its Imagen 3 image generation model, which can now generate brighter and better-composed images. It can render a more diverse range of artistic styles with higher precision, from photorealism to impressionism, from abstract to anime. The upgraded Imagen 3 model can follow prompts more faithfully and present richer details and textures.
Prompt: Background with neon green lighting, shallow depth of field portrait of an Asian woman
Prompt: A close-up macro photography shot of a strawberry intricately carved into the shape of a hummingbird, in mid-flight, with wings blurred into dynamic effect, as if sipping nectar from a brightly colored tubular flower. The background is a lush, colorful garden, with a soft blur effect (bokeh) creating a dreamy atmosphere. The image is extremely detailed, using shallow depth of field to ensure sharp focus on the strawberry hummingbird, while the background softly fades out. High resolution, professional photographer style, soft lighting showcases the scene's details, and professional color grading further enhances the vibrant colors, resulting in exceptional clarity. The depth of field makes the hummingbird and flower stand out against the blurred background.
Prompt: Presented in a high-budget animated film style, the scene is filled with a vivid, painterly texture, showcasing a vast interstellar landscape, with glowing nebulae in purple, blue, and gold intertwining. The protagonist is a little girl draped in a star-patterned flowing cloak, standing at the edge of a crystal-clear cliff. Below the cliff, a river of melted stardust winds through the galaxy, with golden light flickering dynamically. In the background, towering constellations hover in the form of mythical creatures, outlined by glowing dotted lines. Meteors streak across the vast sky, adding dynamism and brilliant light to the scene. The camera angle is slightly elevated, capturing the grandeur of the vast galaxy while also showcasing the protagonist's journey's solitude and mystery
Imagen 3 can generate high-quality images in various formats and styles, from realistic landscapes to richly textured oil paintings or whimsical clay animation scenes.
Prompt: A detailed illustration of a lion roaring majestically in a dreamy jungle, with a purple and white line art background, and a collage on light purple paper texture.
Prompt: A clay animation scene. A wide-angle shot of an elderly woman. She is wearing flowing clothes. She is standing in a lush garden, watering plants with an orange watering can.
In a side-by-side comparison with top image generation models, Imagen 3 achieved state-of-the-art results as judged by human evaluators.
Imagen 3 understands prompts written in natural, everyday language, allowing users to easily obtain desired output without complex prompt engineering.
Prompt: A close-up photo of an origami bird soaring in an urban landscape, with the bird flocking with others of different colors and patterns, casting intricate shadows on the buildings below.
Currently, the latest Imagen 3 model will be globally launched in Google Labs' image generation tool ImageFX, covering over 100 countries. Users can access ImageFX to start experiencing it.
Whisk: A New Tool to Inspire Creativity with Images
Whisk is the latest experimental project launched by Google Labs, allowing users to input or create images to express their themes, scenes, and styles. Users can then combine and remix them to create unique works, such as digital plush toys or enamel pins.
Whisk combines the latest Imagen 3 model with Gemini's visual understanding and description capabilities. The Gemini model automatically writes detailed descriptions for users' images and then provides these descriptions to Imagen 3. This process allows users to easily remix themes, scenes, and styles in fun new ways Google is really "going crazy," redeeming itself.
Source: AI Cambrian, original title: "Breaking! Google Version of Sora Released: The Biggest Competitor is Actually 'KeLing', OpenAI Performs the Worst"
Risk Warning and Disclaimer
The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investing based on this is at your own risk