Meta 被爆明年上半年发布新图像和视频 AI 模型，研究开发世界模型

Media reports indicate that Meta is developing a next-generation image and video AI model codenamed Mango, as well as a large language model codenamed Avocado, which focuses on enhancing programming capabilities. Last week, it was reported that the Avocado being developed by Meta may adopt a proprietary rather than open-source model, marking a significant shift in Meta's long-standing commitment to open-source strategy

Social media giant Meta has recently made headlines with its latest efforts in the AI competition, reflecting a shift in its strategic focus from an open-source model to pursuing cutting-edge profit models.

On Thursday, the 18th, Eastern Time, media reports indicated that Meta's Chief AI Officer Alexandr Wang disclosed during an internal Q&A session last Thursday that Meta is developing a next-generation image and video AI model codenamed Mango, as well as a next-generation large language model (LLM) codenamed Avocado, which is expected to be released in the first half of 2026.

Wang stated that one of the focuses of the Avocado model is to enhance programming capabilities, while the company is in the early stages of researching and developing world models. World models are AI technologies that learn about the environment by absorbing visual information.

This news further confirms Meta's strategic adjustment in the AI field. Last week, Wallstreetcn mentioned reports that Meta is developing a new cutting-edge AI model, Avocado, which has been optimized using third-party models such as Alibaba's Tongyi Qianwen (QWEN) during training, and may adopt a proprietary rather than open-source model. This stands in stark contrast to Meta's previously promoted open-source Llama series.

To promote AI research and development, Meta restructured its AI team this summer, hiring Alexandr Wang to lead the newly established Superintelligence Labs. CEO Mark Zuckerberg personally recruited over 20 researchers from OpenAI to form a team of more than 50 AI experts.

Dual Model Layout: Advancing Image Generation and Language Capabilities

According to reports on Thursday, Meta is simultaneously advancing the development of two core AI models. The image and video model Mango aims to enhance Meta's competitiveness in the generative AI field, while the text model Avocado focuses on improving key capabilities such as programming.

Image generation has become a key battleground for competition among large AI companies.

In late August, Google launched the AI image generation and editing tool Nano Banana based on the Gemini 2.5 Flash model, boosting Gemini's monthly active users from 450 million in July to over 650 million by the end of October.

On September 25, Meta launched the AI video generator Vibes, developed in collaboration with Midjourney, and within a week, OpenAI released its own video generation application Sora.

OpenAI CEO Sam Altman emphasized the importance of AI image generation to consumers during a meeting with reporters last week, stating that it is a primary interest for many users and a "sticky" feature that keeps them coming back.

Strategic Shift: From Open Source to Proprietary Model

According to reports from last week, Meta's AI strategy is undergoing a significant transformation. Many within the company originally expected the Avocado model to be released by the end of this year, but the plan has been postponed to the first quarter of 2026 Reports suggest that Avocado may adopt a proprietary model, meaning external developers will not be able to freely download its weights and related software components. If implemented, this move would mark a significant shift from the open-source strategy the company has long adhered to, bringing its approach closer to that of major competitors like Google and OpenAI.

One of the catalysts for this shift is the failure of Llama 4 to win developer favor after its release in April. Additionally, the R1 model released by the Chinese AI newcomer DeepSeek incorporates some elements of the Llama architecture, which has left some Meta employees dissatisfied, further highlighting the risks of the open-source strategy.

Last year, Zuckerberg predicted that the Llama series would become the "most advanced" model in the industry and specifically discussed Llama during the earnings call in January this year. However, in the latest earnings call in October, he only mentioned the brand once.

In June of this year, Meta invested $14.3 billion in Scale AI, bringing in the founder Alexandr Wang from this unicorn, and raised its capital expenditure guidance for the year to $70 billion to $72 billion when announcing its third-quarter report at the end of October.

World Models: A New Frontier for AI Understanding the Physical World

As previously introduced by The Paper, the inspiration for world models comes from the human mental model of the world, where abstract information obtained through the senses is transformed in the brain into a concrete understanding of the surrounding world. Based on these models, the brain predicts the world, thereby influencing perception and action.

NVIDIA points out that world models are neural networks used to understand the dynamics of the real world, including physical and spatial properties. They can use input data such as text, images, videos, and motion to generate videos that simulate actual physical environments, providing AI with the ability to understand the real three-dimensional physical world, which is significant for the realization of embodied intelligence.

However, world models face significant technical challenges. Training and running world models require enormous computational power compared to the current computational demands of generative models. World models also suffer from hallucination issues and may internalize biases present in the training data. If these obstacles are overcome, world models could bring breakthroughs to robotics and AI decision-making, enabling AI to form an understanding of the context it is in and reason out possible solutions