SLM Counterattacks LLM? Microsoft Bets on Smaller, Cheaper "Large Models"
Analysis suggests that the close collaboration between Microsoft and OpenAI has propelled the performance of OpenAI models to outshine others in the large-scale model market. In addition, Microsoft's Phi series with smaller parameter scales can further seize the long-tail market of open-source models.
In the past year or so, large language models (LLMs) have attracted global attention, from GPT3.5, GPT-4 to LLaMA and other open-source models. However, there are signs that small language models (SLMs) are now receiving more attention.
On January 23rd, media reports cited two informed sources revealing that Microsoft has formed a new team to develop conversational AI. Compared to the OpenAI software currently used, this AI developed by Microsoft requires less computational resources. The sources said that Microsoft has transferred several top AI developers from its research group to the new GenAI team.
Last month, Microsoft unveiled its strategy for small models by releasing Phi-2, a small language model with 2.7 billion parameters. In some benchmark tests, it outperformed Alphabet-C's Gemini Nano 2 and can run on mobile devices such as laptops and smartphones.
Analysis suggests that Microsoft's close collaboration with OpenAI has made the performance of GPT models far ahead in the large model market. In addition, Microsoft's smaller Phi series can further seize the long-tail market of open-source models.
Microsoft is pursuing both large and small models. The aforementioned sources stated that the GenAI team is separate from Microsoft's other team, Turing, which develops large models to improve Bing and other Microsoft products. The GenAI team is dedicated to developing small models.
Microsoft's small model Phi has a small parameter size but can rival GPT-4 in handling certain tasks. To make Phi perform well, researchers used GPT-4 to generate millions of high-quality texts last year and trained Phi based on this data.
Phi caused a sensation in the AI research community, and subsequently, Microsoft launched the latest version of the model, Phi-2, as an open-source model for Azure customers to build their own AI applications. Companies like Goldman Sachs have been testing Phi in recent months.
Meanwhile, Microsoft has been researching how to use small models to handle basic consultation questions for Bing AI chatbots and Windows Copilot users, in order to reduce computing costs.
Earlier this month, at the Davos Forum, Microsoft CEO Satya Nadella praised the company's work in small models as a way to "control our own destiny."
Nadella said, "We value having the best cutting-edge models, and currently, the most advanced large model is GPT-4. We also have Phi, which is Microsoft's best small model. Therefore, we will have a diverse range of models."
"Small Models" Open up New Battlefields
In addition to Microsoft, another French startup, MistralAI, caused a sensation last month with the release of its open-source model, Mixtral 8x7B.
As mentioned in a previous article by Wall Street News, the Mixtral 8x7B model has relatively small-scale parameters but is capable of reaching the level of GPT-3.5.
The reason why it is called Mixtral 8x7B is because it is a sparse model that combines various smaller models trained for specific tasks, thereby improving operational efficiency.
In terms of performance, Mixtral outperforms Llama 2 70B, with an inference speed that is 6 times faster. It is on par with GPT-3.5 in most standard benchmark tests, and even slightly better.
In terms of cost, due to its smaller parameters, Mixtral has lower costs. Compared to Llama 2, Mixtral 8x7B demonstrates its high energy efficiency advantage.
Undoubtedly, smaller-scale models can reduce the cost of running large-scale AI applications and greatly expand the application scope of generative AI technology.
It is worth mentioning that MistralAI has just completed a $415 million financing round, and its latest valuation has exceeded $2 billion, growing more than 7 times in just 6 months.