Track Hyper | Meta Joins the Competition in Edge AI Models

Standing on the shoulders of Qualcomm, is the arrival of AI applications just around the corner?

ChatGPT is good, but unfortunately it is closed source and has a high barrier to entry (not user-friendly), making it difficult for commercialization on the consumer end.

However, on July 18th, Meta teamed up with Microsoft and Qualcomm to shake up the game: Microsoft's cloud service Azure provides cloud services for Meta AI's new generation open-source large model, Llama 2; at the same time, Llama 2 can also run on Qualcomm chips, enabling intelligent edge AI capabilities.

Huawei Wall Street noticed that OpenAI just announced on July 22nd that they will release the ChatGPT Android App next week. This is equivalent to directly facing the challenge of consumer-end applications of Meta AI's large models.

If large-scale implementation of edge AI models can truly be achieved, then the spring of consumer electronics innovation, represented by intelligent terminals, may come again.

What do Meta, Microsoft, and Qualcomm want to do?

Open-source large models themselves are not new and can hardly be considered novel.

Llama, in simple terms, is a semantic training large model that only accepts text input, and it is more accurate to call it "Llama-Chat". The feature of Llama lies in its open-source nature (GPT and PaLM are closed source) and its free availability. The initial version was released in February of this year, and the training time was from January to July. The entire process used 3.3 million GPU hours, with Nvidia A100-80GB GPUs consuming 350W-400W of power. The total training cost reached a maximum of 45 million US dollars.

Llama 2 has a global batch size (context length) of 4 million tokens, which is twice as high as the first-generation Llama and equivalent to GPT-3.5. The maximum parameter size is 70 billion (including three parameter variants: 7 billion, 13 billion, and 34 billion), and the training corpus size is 20 trillion tokens. Among them, the 70 billion parameter variant improves inference scalability with Grouped-Query Attention (GQA).

What is a token?

This is the basic unit for large models to process and generate language text. It can be understood that the more tokens used for training, the higher the level of AI intelligence.

As a comparison, Google's new generation large model PaLM 2 has a training corpus size of 3.6 trillion tokens; GPT-3 is 300 billion, and it is speculated that GPT-4 may also exceed tens of trillions.

In terms of AI capabilities, Llama 2 still lags behind GPT-4 and cannot match Google's PaLM2. Llama 2's performance is unlikely to shake OpenAI's market position, but through free commercial use, Meta may achieve overtaking on the bend through the open-source ecosystem.

It is necessary to further explain the disadvantages of closed-source models.

The most concerning issue is security. When training the ChatGPT dialogue model, due to its closed-source nature, the data of the dialogue content is essentially entering a black box.

When it comes to privacy or sensitive information, such as financial data, personal privacy, or trade secrets, using the ChatGPT dialogue model may result in the "public information" being leaked during other training processes. For example, the famous "ChatGPT Grandma Exploit" directly exposed the valid serial number of Microsoft's Windows 11.

In the B2B sector, the consequences of this black box effect may be even more severe.

Many companies are not limited to using standard LLM capabilities; they customize LLM datasets for specific scenarios based on their own business needs to solve specific problems. However, due to the closed-source black box issue, it is difficult to ensure the private use of these specific scenario business data. Once leaked, these companies may suffer significant losses or lose their competitive advantage.

The collaboration between Meta, Microsoft, and Qualcomm in deploying on-device models has far greater significance than model upgrades. If we consider Qualcomm's demonstration of on-device AI model capabilities in February this year, it is not difficult to imagine that a new wave of technological innovation in consumer electronics, especially smart mobile terminals (including smartphones and IoT), is brewing rapidly.

The main collaboration between Meta AI and Microsoft is to provide Azure cloud services to global developers of Llama 2. In other words, in the future, application developers based on the Windows system will be able to use Llama 2's AI capabilities. This significantly reduces the threshold for C2B applications of AI LLM, eliminating the need for users to configure software environments themselves.

According to Microsoft, Llama 2 has been optimized for Windows and can be deployed and run directly on Windows locally.

Once Microsoft releases a Windows operating system update based on the Llama 2 model (Windows currently has the highest market share of operating systems worldwide), PC users worldwide who use Microsoft Windows will be able to easily achieve on-device AI model AGI capabilities, and the wave of personalized AI applications will surge.

On-device and hybrid AI, which is more important?

The collaboration between Meta AI and Qualcomm holds even more potential.

According to Wall Street News, Qualcomm and Meta are working together to optimize the execution of the Meta Llama 2 large language model directly on the device side. This process does not rely solely on cloud services, allowing Llama 2-like generative AI models to run on devices such as smartphones, PCs, VR/AR headsets, and cars. This will help developers save on cloud costs and provide users with a more private, reliable, and personalized experience.

Qualcomm plans to support on-device AI deployment based on Llama 2 to develop new AI applications. This will support B2B companies, partners, and developers in building intelligent virtual assistants, productivity applications, content creation tools, entertainment, and other use cases. These new AI experiences implemented on terminals running on Snapdragon chips can work in areas without network connectivity or even in airplane mode.

Starting from 2024, Qualcomm plans to support AI deployment based on Llama 2 on terminals equipped with Snapdragon platforms. Developers can now begin optimizing AI applications for terminal-side AI using the Qualcomm AI Stack. The Qualcomm AI Stack is a set of dedicated tools that support more efficient AI processing on Snapdragon platforms, enabling even lightweight and compact terminals to support terminal-side AI.

Unlike some application technology companies that attempt to apply single-point AI models on the edge, Qualcomm's layout in this field is highly profound.

In February of this year, the second-generation Snapdragon 8 mobile platform from Qualcomm became capable of running AI models with over 1 billion parameters, demonstrating the world's first edge-side execution of a model with over 1 billion parameters (Stable Diffusion).

Regarding the scale of models that can be effectively supported on the edge, Hou Jilei, Qualcomm's Vice President and Head of Qualcomm AI, believes that there are many use cases that are based on models with billions of parameters. Ranging from 1 billion to 10 billion parameters, they can cover a significant majority of generative AI and provide excellent results.

In mid-June, Qualcomm also demonstrated the ControlNet image generation model. This model has 1.5 billion parameters and can run entirely on a smartphone. ControlNet is a generative AI solution known as a language-visual model (LVM), which can generate images more accurately by adjusting input images and input text descriptions.

In this demonstration, Qualcomm was able to generate AI images on a mobile terminal in less than 12 seconds without accessing any cloud services, providing an efficient, interesting, reliable, and private interactive user experience.

According to Hou Jilei, in the coming months, Qualcomm is expected to support models with over 10 billion parameters running on the edge, and by 2024, models with over 20 billion parameters will be supported. In addition, through full-stack AI optimization, the inference time of large models will be further reduced in the future.

Qualcomm's technological innovations in edge-side AI model deployment mainly include the Qualcomm AI Model Efficiency Toolkit (AIMET), the Qualcomm AI Stack, and the Qualcomm AI Engine. In addition, another groundbreaking technology in Qualcomm's AI research is the 1080p video encoding and decoding process on mobile terminals.

Neural network codecs have a wide range of applications: they can be customized for specific video requirements, optimize perceptual quality through the advantages of generative AI, and can be extended to new modalities to run on general AI hardware. However, this also brings many challenges that are difficult to address on terminals with limited computing power. Therefore, Qualcomm has designed a neural network video frame compression architecture that supports 1080p video encoding on terminals.

Although Qualcomm's progress in deploying AI models on the edge is rapid, Qualcomm believes that hybrid AI is the future of AI: hybrid AI architecture distributes and coordinates AI workloads between the cloud and edge terminals, allowing the cloud and edge terminals such as smartphones, cars, personal computers, and IoT terminals to work together in synergy. To achieve a more powerful, efficient, and highly optimized AI.

Cost savings are the main driving factor for hybrid AI to dominate the future.

For example, it is estimated that the cost of each network search query (Query) based on generative AI is 10 times that of traditional search. Hybrid AI will enable generative AI developers and providers to leverage the computing power of edge terminals to reduce costs. The hybrid AI architecture or edge AI can bring advantages such as high performance, personalization, privacy, and security on a global scale.

If Qualcomm's judgment is more in line with the future direction of AI applications, then the fusion of cloud computing and edge AI computing will inevitably be realized. The continuous implementation of edge AI models, whether at the system level or individual nodes, also presents new technological innovation opportunities for new industries or business models. In any case, consumer electronics represented by IoT or smartphones, a new wave of technological innovation is already within reach.