No code, only neural networks! Tesla FSD's major update, what's different about V12?

On November 25th, it was reported by the media that Tesla has started rolling out the Full Self-Driving (FSD) V12 version to its employees, with the updated version number being 2023.38.10.

On November 25th, it was reported that Tesla has started rolling out the Full Self-Driving (FSD) V12 version to its employees, with the updated version number being 2023.38.10. Soon after, Tesla CEO Elon Musk confirmed this news on Twitter.

Earlier this month, Musk announced that the Tesla FSD V12 autonomous driving system would be available for beta testing within two weeks. However, there were doubts in the market regarding this timeline. Now, it seems that the FSD V12 version is entering its final stage before being released to customers, and it may be launched this year.

Why is there so much anticipation for Tesla's FSD V12?

The most important reason is that Tesla has repeatedly emphasized that the FSD V12 will achieve a brand-new "end-to-end autonomous driving" experience. For the first time, it will use neural networks for vehicle control, including steering, acceleration, and braking, eliminating the need for over 300,000 lines of code that were previously required. Instead, it relies more on neural networks, reducing the dependence on hard-coded programming.

In a previous test drive livestream, Musk stated that FSD Beta V12 is the first-ever end-to-end AI autonomous driving system, completely driven by AI from start to finish. We didn't do any programming, no programmers wrote a single line of code to recognize concepts such as roads and pedestrians. We left it all to the neural network to figure out. The C++ code for V12 is only 2,000 lines, while V11 had 300,000 lines.

In simple terms, V12 takes the image data captured by the cameras and inputs it into the neural network, which can directly output vehicle control commands such as steering, acceleration, and braking. It is more like a human brain, with 99% of the decisions made by the neural network itself. It doesn't require high-definition maps or LiDAR, relying solely on visual input from the vehicle's cameras to analyze and think, and output control strategies.

Media analysis suggests that the release of the FSD V12 version will be a crucial moment for Tesla in terms of AI and autonomous driving. It is not only about technological prowess but also about how AI can better integrate with human behavior.

There are still many doubts and differences in the market regarding the technical details and potential impact of this new architecture. According to the information disclosed by Tesla and Musk's posts on Twitter, CITIC Securities believes that Tesla currently has two "end-to-end" research paths in progress internally: 1) cascaded end-to-end neural networks, and 2) World Model. The FSD V12 is more likely to be the former and is expected to be implemented in early next year to better achieve Level 3 capabilities.

What is the difference with "end-to-end autonomous driving"?

Before FSD V12, Tesla's autonomous driving system relied on rule-based judgments. Relying on the car's cameras to recognize lanes, pedestrians, vehicles, signs, and traffic lights, Tesla engineers manually wrote hundreds of thousands of lines of C++ code to handle various situations, such as stopping at red lights, proceeding at green lights, and crossing intersections only when there is no approaching vehicle.

However, now, as the most important upgrade to Tesla's autonomous driving system, FSD v12 simply feeds videos to the neural network, allowing it to continuously learn and optimize parameters by analyzing billions of frames of human driving videos.

CITIC Securities pointed out that from a technological perspective, the cascaded end-to-end neural network system uses neural network algorithms throughout the entire process from input to output, without any manual rules. Currently, most autonomous driving models are modular architectures, where perception, prediction, planning, control, and other tasks are handled by different small models, and rule-based approaches are still prevalent in downstream control.

In contrast, the "end-to-end" neural network can directly output control commands such as steering, braking, and acceleration after inputting an image. To improve training effectiveness, the "end-to-end" neural network may consist of multiple small sub-networks cascaded together.

Unlike the traditional modular architecture that connects modules with "rules," the sub-modules of the cascaded neural network are trained and stacked in a "neural network" manner. Therefore, the entire end-to-end model can be optimized through data-driven approaches, avoiding the dilemma of "local optima rather than global optima."

The core advantage of the end-to-end/neural network approach is that the key to model iteration shifts from "engineers" to the more scalable "data and computing power." As a result, training efficiency and performance limits are expected to be significantly improved. In practical terms, CITIC Securities believes that the performance potential demonstrated by the end-to-end solution is expected to greatly enhance the level of autonomous driving system takeover, achieving truly undisputed L3 capabilities (such as taking over once a week).

However, the "black box" issue of end-to-end models currently lacks a mature solution in the industry. Therefore, CITIC Securities believes that whether it can ultimately move towards pursuing the ultimate safety of L4 fully autonomous driving remains to be seen.