
XPeng and Li Auto fiercely debate VLA, who is running naked, and who is taking a big gamble?

XPeng and Li Auto are engaged in a heated debate over the autonomous driving VLA model. Li Auto's head of autonomous driving, Lang Xianpeng, claims that VLA is the best solution, while XPeng's founder, He Xiaopeng, is heavily betting on VLA, with the wager involving future technological outcomes. Both companies firmly support the VLA route, while Huawei and Nio have chosen the world model. This debate has sparked widespread attention in the field of intelligent driving
In December, China's intelligent driving circle reached the peak of public opinion through a "naked run" and "rebuttal."
Yesterday, Lang Xianpeng, the head of autonomous driving at Li Auto, published a long Weibo post responding to Wang Xingxing, the founder of Yushu Technology, who previously claimed that "the VLA model is a relatively foolish architecture."
In the long Weibo post, Lang Xianpeng mentioned two core points: "VLA is the best model solution for autonomous driving" and "embodied intelligence ultimately competes on system capabilities."
Lang Xianpeng's response was resolute and became the absolute hot topic in the intelligent driving public opinion arena that day.
Coincidentally, just 24 hours later, He Xiaopeng, the founder of XPeng, also posted a long Weibo, with the theme also being VLA. Moreover, He Xiaopeng elevated his attitude towards VLA to a new peak, even willing to place a heavy bet on it.
He stated in his Weibo that he made a bet with the autonomous driving team.
The content of the bet is that if by August 30, 2026, XPeng's VLA can achieve the overall effect of FSD V14.2 in Silicon Valley, He Xiaopeng will build a very distinctive Chinese-style canteen in Silicon Valley, referencing XPeng's current headquarters restaurant.
Conversely, if this cannot be achieved, Liu Xianming, the head of XPeng's autonomous driving center, will have to promise to run naked on the Golden Gate Bridge.
Within 24 hours, two long Weibo posts ignited the topic of intelligent driving in the new car-making industry.
Both XPeng and Li Auto are currently staunch advocates of the VLA route. XPeng's VLA 2.0, just released at AI Day, will officially start pioneer internal testing in December; Li Auto will further upgrade the experience of the VLA large model driven by reinforcement learning in OTA 8.1.
On the other hand, Huawei and Nio are racing down the path of world models.
Huawei's Jin Yuzhi once mentioned, "We do not follow the VLA route; WA (world model) is the ultimate solution"; Nio's Li Bin promised, "Nio's world model will return to a top position in the industry."
Advanced driver assistance is still brewing on the eve of qualitative change, but automakers still need to prove their technological advancement.
This time, the debate over VLA raises the question: is it innovation and breakthroughs, or internal competition and strife?
Here are some thoughts to ponder.
The Call for VLA
There are many commonalities in the two lengthy Weibo posts, both advocating for VLA and providing resounding conclusions.
To recap, Wang Xingxing from Yushu Technology's remarks on VLA and world models came from the World Robot Conference in Beijing on August 9 this year.
At that time, this statement was equivalent to igniting the public discourse on large models—not the robotic large models, but rather the intelligent driving large models.
After a new round of AI baptism, it has become an industry consensus that intelligent driving is a subset of embodied intelligence, which is the background.
Because of this, Wang Xingxing's viewpoint was immediately used by netizens opposing VLA as evidence to refute the automakers and intelligent driving companies in the VLA camp.
Although Lang Xianpeng's lengthy Weibo response has been some time since August, it does not affect his determination for Li Auto, such as the statement, "VLA is the best model solution for autonomous driving."
In fact, the core technological highlight of VLA lies in the title of the paper that defines the VLA route.
VLA, vision-language-action large model, was first mentioned in the Google DeepMind team's paper "RT-2: Vision-Language-Action Models" in July 2023.
The subtitle of the paper is: "Vision-Language-Action Models Transfer Web Knowledge to Robotic Control," indicating that the VLA large model transforms web knowledge into robotic control Two and a half years ago, VLA's groundbreaking work corresponded to Lang Xianpeng's statement two and a half years later: "We are using GPT's approach to achieve autonomous driving."
What this refers to is not that the Li Auto VLA large model has already matched GPT-5, but rather that the essence of VLA, and even the development direction of autonomous driving large models, lies in using general knowledge from the real world to continuously perfect solutions for "long-tail scenarios."
In simpler terms, VLA allows intelligent driving to possess social experiences closer to those of humans.
For example, the new feature that various VLAs will be competing on in the second half of the year—"gesture recognition"—is a manifestation of "social experience," enabling intelligent driving to correctly determine whether someone is trying to hitch a ride or if a traffic police officer is directing you to proceed.
The first car manufacturer to demonstrate this feature at a press conference was XPeng.
This afternoon, He Xiaopeng's long Weibo post, while not as lengthy as Lang Xianpeng's, made his determination more evident, especially since he made a "naked run" bet.
As for why He Xiaopeng himself is not doing the naked run, it’s because he already made such a bet seven years ago when the G3 was launched, and ultimately, the G3 did sell quite well; otherwise...
Returning to a month ago at XPeng's AI Day, He Xiaopeng officially announced that XPeng's VLA 2.0 will officially start pioneer user testing in the fourth quarter.
In XPeng's PPT, the core evolution of VLA 2.0 lies in removing the traditional "L" and directly generating action instructions using implicit logic.
He Xiaopeng believes that this will greatly reduce the latency of the VLA large model and increase the takeover mileage in urban villages and other narrow roads by 13 times.
However, accompanying the XPeng VLA 2.0 is another question: Does the removal of "L" make VLA more like a world model? This goes back to the essence of the evolution of large models: minimizing the loss of information transmission as much as possible and improving the efficiency of information transmission.
The VLA's move towards L-free is indeed consistent with the underlying optimization direction of world models—reducing reliance on thought chains, enhancing the information utilization efficiency of image tokenization, thereby achieving stronger intelligent driving performance.
However, this absolutely does not mean that VLA and world models have immediately determined a winner or that the VLA camp is converging towards world models.
In fact, VLA and world models are both showing a trend of "convergence."
For example, on the VLA side, XPeng is allowing the model to learn spatial-temporal patterns directly from video streams to cultivate intuition, while Li Auto is also using world models for data generation, simulation testing, and reinforcement training in the cloud.
Even the staunchest advocates of world models refer to VLA as "rote memorization" by Yann LeCun, whose latest concept of AGI, "JEPA Joint Embedding Predictive Architecture," has also drawn heavily from the achievements of the VLA technical route.
For instance, he believes that VLA interacts with the real world very efficiently, and he also thinks that a true JEPA world model needs to have a part responsible for perception and evaluation, similar to VLA logic.
As for whether there is L, it has never been important for Yann LeCun, Google, and now He Xiaopeng and others.
Do you remember the title of the paper that Google pioneered in the VLA track? Knowledge transforms into control; where is the language?
However, it is very important for Li Auto and XPeng to occupy a higher ground in public opinion regarding VLA.
Never Let Go of the Discourse Power
Emerging from the alluring quagmire of technological innovation, let's finally talk about the heated atmosphere of intelligent driving marketing.
Time is the best touchstone. Elon Musk's statement at the 2021 AI Day, "To achieve FSD, Tesla must solve AI in the real world," now seems very straightforward and simple, yet it has long occupied the absolute marketing peak claimed by Musk.
Because the first to make a resounding statement has the qualification to be traced back.

Whether it is Lang Xianpeng's long Weibo post, Li Xiang's 20-minute speech during the third-quarter earnings call, or He Xiaopeng's bet today, all reflect the urgency of China's new car manufacturers to firmly hold onto the discourse power of intelligent driving.
Technological progress shines brilliantly after qualitative changes, but more often it is mundane during the period of quantitative changes. After the marketing of intelligent driving hits the brakes in 2025, the neologism movement will come to a temporary halt, and the slogan competition will also cease.
However, the task of conveying the leading position of car companies to consumers remains heavy; on one side, FSD V14 is preparing for a comeback, while on the other side, robot companies have quickly taken center stage in the AI arena.
Even He Xiaopeng candidly stated, "Our first version still does not reach the level of FSD V14.2"—perhaps this is the direct reason for the bare-running bet.
Li Auto and XPeng have shown a sense of urgency, but they are certainly not the only ones feeling it.
From a media perspective, we hope this sense of urgency can quickly translate into user surprises, and it must translate into user surprises.
Risk Warning and Disclaimer
The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investing based on this is at one's own risk
