"The World Model" - The next "battleground" for AI, with NVIDIA and Google both entering the fray
The "world model" is touted by the industry as the next key breakthrough in the field of AI. NVIDIA, Google, and many startups are pursuing world models. NVIDIA launched the Cosmos world model, Google’s DeepMind formed a world model research team, and AI pioneer Fei-Fei Li's World Labs raised $230 million to build a "large world model"
Source: Hard AI
Author: Zhao Ying
Jensen Huang appeared in a new leather jacket at the 2025 CES, where he not only launched the explosive GPU RTX 5090 but also announced his entry into the most critical direction in the AI field: "world models."
On January 7, Huang announced the launch of the Cosmos World Foundation Models (Cosmos WFMs) at the 2025 Consumer Electronics Show (CES) in Las Vegas. This model is designed to understand the physical world and can predict and generate videos with "physical perception."
Specifically, Cosmos WFMs are divided into three categories:
(1) Nano: suitable for low-latency and real-time applications; (2) Super: high-performance baseline model; (3) Ultra: highest quality and fidelity output.
The parameter scale of these models ranges from 4 billion to 14 billion, with Nano being the smallest and Ultra the largest. NVIDIA also released upsampling models, video decoders optimized for augmented reality, and guardrail models to ensure responsible use.
In fact, besides NVIDIA, Google and many startups are also pursuing world models. Google’s DeepMind has formed a world model research team, hiring Tim Brooks, a core member of Sora, to lead the effort. Additionally, "AI mother" Fei-Fei Li's World Labs, and startups Decart and Odyssey are also involved.
Not only has this attracted a host of tech companies, but "world models" are also touted within the industry as the next key breakthrough in the AI field. So what exactly do "world models" refer to? What is their significance?
NVIDIA Enters "World Models," Tech Giants Compete
According to NVIDIA, Cosmos WFMs have been trained on 90 trillion tokens, with data sourced from 20 million hours of real-world human interactions, environments, industrial, robotic, and driving data. The model can be fine-tuned for specific applications and is accessible through NVIDIA's API and NGC catalog, GitHub, and the AI development platform Hugging Face.
Several companies have begun trialing Cosmos, with NVIDIA stating that Waabi, Wayve, Fortellix, and Uber have all committed to testing Cosmos WFM across various use cases, from video search and curation to building AI models for autonomous vehicles.
However, NVIDIA's refusal to disclose the specific sources of the training data has sparked copyright disputes, with analysts suggesting this is why NVIDIA refers to these models as "open" rather than "open source."
Meanwhile, Google DeepMind is also actively laying out its strategy in the world model field. According to TechCrunch, DeepMind is forming a dedicated world model research team to expand its leading position in this area The team will be led by former OpenAI researcher Tim Brooks, who joined DeepMind last October.
DeepMind released Genie last month, a model that can simulate virtual worlds as well as realistic animations and physical effects, supporting interactions between all these elements. For example, users can create various sample worlds using Genie, including sailing simulations, cyberpunk westerns, etc., and can prompt Genie using text, images, or a combination of both.
In addition to tech giants like NVIDIA and Google, there are also many dazzling startup players. "AI godmother" Fei-Fei Li's World Labs has raised $230 million to build a "big world model," and companies like Decart and Odyssey have also entered the fray. Additionally, OpenAI's previously released Sora model can also be seen as a type of "world model," capable of simulating behaviors such as a painter leaving strokes on a canvas, as well as rendering UI and game worlds similar to Minecraft.
The Next Key Breakthrough in AI: World Models
What are AI "world models"? Why are they important?
Specifically, world models refer to internal representations created by training on large amounts of image, audio, video, and text data, which capture how the world operates and can infer the consequences of actions. This allows them to better understand and simulate the laws of the real world.
The concept of world models originates from the mental models formed by the human brain, which can integrate abstract information obtained through the senses into a concrete understanding of the surrounding world, thus forming "models" that help us predict and perceive the world.
The characteristic of world models is that they attempt to go beyond data, simulating human subconscious reasoning. For example, a baseball batter can instinctively decide how to swing the bat within milliseconds because they can predict the trajectory of the ball. This ability for subconscious reasoning is considered one of the prerequisites for achieving human-level intelligence.
The significance of "world models" lies in their ability to enable complex reasoning and planning, as well as breakthroughs in generative video technology:
1. Breakthroughs in Generative Video Technology: World models show great potential in the field of generative video. Compared to traditional generative models, world models that understand basic physical laws can more accurately simulate the movement of objects. For instance, it can not only predict that a basketball will bounce but also understand why it bounces. Alex Mashrabov, former AI head at Snap and CEO of Higgsfield, stated that with powerful world models, creators do not need to define the expected movement of each object; the model itself can understand these.
2. Complex Prediction and Planning: Yann LeCun, Chief AI Scientist at Meta, believes that world models may be used for complex prediction and planning in both digital and physical domains in the future. For example, given a messy room (initial state) and a tidy room (goal state), a world model can infer a series of cleaning actions, rather than just operating based on observed patterns With these capabilities, "world models" can widely empower industries such as film, gaming, autonomous driving, and robotics.
Justin Johnson, co-founder of World Labs, predicts that future world models may be able to generate 3D worlds on demand for uses such as gaming and virtual photography, significantly reducing development costs and time. World models will not only be able to obtain images or video clips but also a fully simulated, vivid, and interactive 3D world.
A 2024 study by the Animation Guild, which represents Hollywood animators and cartoonists, estimates that artificial intelligence could disrupt over 100,000 film, television, and animation jobs in the United States within the next two years.
World models are also expected to drive advancements in robotics by enhancing robots' perception of their surroundings and themselves, helping them better understand the context they are in and reason potential solutions.
Despite the promising outlook, the development of world models still faces numerous technical challenges:
Huge computational demands: Training and running "world models" require more computational power than current generative models; hallucination and bias issues: Like all AI models, "world models" can also produce hallucinations and internalize biases present in the training data.
Limitations of training data: A lack of sufficiently broad and specific training data may exacerbate the above issues. Complex behavior simulation: Current models struggle to accurately capture the behaviors of world inhabitants (such as humans and animals).
Over the past year, AI technology has continued to break through in multiple directions, and world models are seen as the next major breakthrough. Although it will take several more years to mature "world models," this technology has already shown great potential. If all major obstacles can be overcome, "world models" are expected to bring significant breakthroughs in areas such as virtual world generation, robotics, and AI decision-making, opening new pathways for the integration of artificial intelligence with the real world