CSC: Merger of Two AI Labs under Alphabet-C Expected to Further Accelerate Progress in Robot Models

Google Deepmind recently released research results, including the "self-improvement" capability of "RoboCat" and the VLA model "RT-2" that integrates large language models. The intelligence of robots is further accelerating, and it is expected to usher in a new wave of AI revolution.

According to the information obtained from the Zhixin Finance APP, CITIC Securities has released a research report stating that on April 20, 2023, Alphabet-C announced the merger of its two world-class AI laboratories, Alphabet-C Brain and DeepMind, to establish the Alphabet-C DeepMind department. With Alphabet-C's computing resources as a backing, the development and application of artificial intelligence will be accelerated. In June and July 2023, Alphabet-C DeepMind released its latest research achievements, including the "RoboCat" with the ability of "self-improvement" and the VLA model "RT-2" that integrates large language models. The intelligence of robots is further accelerated, and a new wave of AI revolution is expected.

The main points of CITIC Securities are as follows:

From Gato to RoboCat, larger-scale training datasets and innovative self-improvement methods help create stronger robot intelligences. The Gato model proposed in May 2022 expanded the intelligence to the field of robot control. However, there is still room for improvement in terms of "generality" and "intelligence." The model architecture and the serialization of control task data are important foundations for the development of subsequent models. The RoboCat proposed in July 2023 is based on the Gato model and expands the training dataset to 4 million robot-related segments. It innovatively introduces the method of "self-improvement" to enrich the training data. These two innovations enable RoboCat to improve the performance of training tasks, possess certain generalization capabilities, and handle unseen tasks with only a small amount of fine-tuning.

From RT-1 to RT-2, large language models bring stronger generalization, logical reasoning, and knowledge capabilities, empowering robot intelligence. The RT-1 model proposed in December 2022 built a bridge between specific instructions, images, and robot instructions. The PaLM-E model in March 2023 can process input text and image information, transforming complex tasks into instructions that RT-1 can accept. The RT-2 proposed in July 2023 is a fusion of the two. With the powerful capabilities of large language models, RT-2 can decompose complex tasks, perform simple calculations, and recognize faces in real-world scenarios, tasks that previous models were unable to accomplish. The level of intelligence is significantly improved.

Different routes lead the development, and team integration promotes innovative collaboration. The Alphabet-C Brain and DeepMind teams have gradually advanced the development of AI robot models from two different perspectives. The DeepMind team continuously improves robot capabilities from the perspective of intelligent agents, so most of the training data in RoboCat comes from reinforcement learning. The model parameters are well controlled, enabling higher-frequency robot control. On the other hand, Alphabet-C Brain attempts to apply large language models to the field of robot control. Therefore, the model parameters of RT-2 are larger, and it has better generalization capabilities. There is stronger performance in terms of knowledge and reasoning abilities. With the further integration of the two teams and the deepening of collaborative work in data and model aspects, Alphabet-C's progress in robot modeling is expected to accelerate further.