Pushing the second generation VLA, XPeng bets on both mass-produced cars and Robotaxi for L4 simultaneously

Robotaxi will begin global deliveries in 2027

At the start of the new year, XPeng Motors Chairman and CEO He Xiaopeng stated in his work letter that the moment for autonomous driving's DeepSeek has arrived.

Around this core business focus, XPeng's progress has been rapid. On March 2, He Xiaopeng officially announced that XPeng's second-generation VLA (Vision-Language-Action model) will officially begin full-scale rollout later this month.

This is not just a routine software upgrade. In He Xiaopeng's view, the current industry L2 assisted driving is essentially a patchwork of different technical solutions, with end-to-end small models having reached a capability ceiling, and intelligent driving research has reached a critical watershed moment.

"The second-generation VLA is the first version aimed at fully autonomous driving, and it will iterate at a speed never seen before by XPeng," He Xiaopeng predicted. "Fully autonomous driving will completely arrive within the next 1-3 years, and autonomous driving will truly become a part of people's daily travel habits."

At the same time, He Xiaopeng also announced that the Robotaxi equipped with this model has started public road testing, with trial operations set to begin within the year and global deliveries starting in 2027.

Towards the future L4 endpoint, XPeng has already pulled the trigger.

Intelligent Driving Moving Beyond the "Geek Circle"

"L2 is just a patchwork." He Xiaopeng stated bluntly.

In his view, the mainstream assisted driving systems in the industry are essentially stitched together from different technical solutions: one logic for highway scenarios, another for urban scenarios, and yet another set of rules for parking. This modular stacking may perform excellently in a single scenario, but once faced with the complexities of the physical world, the system's disjointedness becomes glaringly apparent.

The second-generation VLA attempts to fundamentally answer a question: If the goal is L4, why follow the L2 path of patching things together?

The answer is a complete paradigm reconstruction. The second-generation VLA no longer relies on language models as an intermediate translation layer but directly achieves end-to-end mapping from visual input to action output.

This means the system can deduce the optimal driving strategy based on visual information like a human driver, rather than performing serial calculations across multiple modules such as perception, prediction, and planning.

This shift in technical direction is particularly evident at the data level.

According to Liu Xianming, head of XPeng Motors' General Intelligence Center, the second-generation VLA has accumulated 50PB of training data, processing approximately 5.3 billion bytes of visual information per second, and has completed 468 model iterations since the 2025 Technology Day.

Even more noteworthy is that the token consumption for vehicle-side model inference is about 80 times the daily usage of digital AI tokens nationwide. Behind this set of data is XPeng's belief in the law of scale: L4 capability = model × computing power × data × ontology.

On the computing power front, XPeng's fully self-developed Turing chip has begun to play a key role. Official data shows that its computing power utilization rate is as high as 82.5%, with model inference taking only 80 milliseconds, and the actual effective computing power of one Turing chip is approximately equal to that of ten Orin-X chips. Under the joint optimization of chip-compiler-model, the compilation efficiency of the base model has improved by 12 times This is also Liu Xianming's first public appearance since the merger of XPeng's Autonomous Driving and Intelligent Cockpit Center, establishing the General Intelligence Center.

He admitted that the research and development over the past period has been "very painful," stating that "all research and development in autonomous driving had to be completely redone, going back to the starting point, changing the autonomous driving problem into a physical AI problem."

This statement somewhat confirms the depth of this technological upgrade: it is not an optimization based on the existing architecture, but a complete restart from the underlying logic.

Furthermore, the second-generation VLA of XPeng has achieved a comprehensive upgrade in three major experience dimensions: smooth reassurance, all-scenario capability, and high efficiency.

In terms of smooth reassurance, the second-generation VLA can recognize various irregular vehicles, navigate around accident scenes, reduce speed in advance on bumpy roads, and yield to small animals at night.

Actual test data shows that its heavy braking has been reduced by 99%, rapid acceleration by 98%, safety takeover by 60%, road obstacle recognition capability has improved by 124%, side and rear vehicle recognition capability has improved by 118%, nighttime decision-making accuracy has improved by 96%, and avoidance comfort has improved by 95%.

All-scenario capability means that the second-generation VLA covers park paths, rural dirt roads, and roads without navigation, capable of handling complex scenarios such as passing through narrow paths and avoiding potholes on rural roads, while also supporting P gear starting from a standstill, achieving full assistance throughout the journey.

In terms of efficiency, the second-generation VLA has improved overall driving efficiency by 23% while ensuring safety and stability. In actual tests in urban Guangzhou, the travel time was one minute faster than the navigation estimate of 44 minutes, reaching 43 minutes.

The performance of XPeng's second-generation VLA has also received some positive evaluations from the outside world. Analysts at Morgan Stanley believe that "the second-generation VLA of XPeng is a bold leap," and that "Tesla will face more competition from Chinese companies like XPeng that have the capability to compete with its autonomous driving technology in the global market."

Ambitions for Robotaxi

Beyond passenger car intelligent driving, another focus of the second-generation VLA points to Robotaxi.

He Xiaopeng revealed that Robotaxi equipped with the second-generation VLA has begun public road testing, with trial operations set to start within this year and plans for global delivery to begin in 2027. Volkswagen will be the first customer for this model.

This timeline coincides with a sensitive moment in the Robotaxi industry. The year 2026 is widely regarded as the year for large-scale deployment of Robotaxis, with players like Tesla, Baidu, Pony.ai, and WeRide ramping up their layouts.

Tesla plans to achieve mass production of CyberCab in the first quarter of 2026; as of February, Baidu's Apollo Go has been operating in 26 cities; Pony.ai and its joint venture with Toyota China plan to deploy thousands of Robotaxis by 2026.

However, alongside the rising industry enthusiasm, public skepticism about safety has also increased.

Yu Qian, CEO of Lightyear, recently stated, "China's Robotaxi may be slower than the U.S. due to multiple factors including labor costs and legal regulations."

Against this backdrop, XPeng's Robotaxi strategy presents a somewhat cautiously aggressive approach. The caution lies in its choice to first validate technology on passenger cars, accumulating data and scenarios through large-scale user usage; The radical aspect is that He Xiaopeng has set the timeline for fully autonomous driving to "within the next 1-3 years."

This judgment is based on a rethinking of the technological roadmap. He Xiaopeng believes that the industry will leap directly from L2 to L4, with L3 being merely a transition. In his proposal at this year's Two Sessions, he also suggested accelerating the regulatory and management policies to facilitate the transition from L2 to L4 in autonomous driving technology.

If this judgment holds, then the investment based on the second-generation VLA will determine XPeng's position in the next competitive cycle.

Another noteworthy signal is that XPeng's second-generation VLA is planned to start global deliveries in 2027. This not only signifies the output of technological capabilities but also involves the adaptability to overseas market regulations and road scenarios.

He Xiaopeng has also made a military order: August will be a major test for the autonomous driving team, and XPeng aims to achieve the same effect in China as Tesla does in Silicon Valley.

This benchmarking is not coincidental. The push of Tesla's FSD V12 version in early 2024 has greatly excited He Xiaopeng, prompting him to request key team members to experience it in the U.S. The smoothness, human-like feel, and cognitive abilities—these evaluations of FSD have now become the goals pursued by XPeng's second-generation VLA.

From a more macro perspective, the launch of XPeng's second-generation VLA marks the entry of China's intelligent driving industry into a new competitive phase.

Industry insiders have told Wall Street Watch that as L2 assisted driving gradually becomes standard, and as price wars compress profit margins, technological breakthroughs become an inevitable choice for leading players.

Concepts like physical AI, L4 autonomous driving, and Robotaxi, which once seemed distant, are now being transformed into concrete product plans and delivery timelines, paving the way for XPeng to become a leading player.

He Xiaopeng's expectation for the future is: "Once policies and regulations are opened up, fully autonomous driving can ensure that everyone can safely get home after drinking at night, and it can allow family members to go out without having to drive themselves, enabling the car to come and serve them proactively."

This is undoubtedly a highly attractive vision of the future. However, the road to this vision involves navigating through technical validation, regulatory improvements, user acceptance, and the shadow of safety incidents, all of which are hurdles that must be crossed. Whether the second-generation VLA can truly usher in the DeepSeek moment of autonomous driving remains to be seen in the real feedback from users in the future