Within a day, several industry giants announced new advancements in AI, leading to a fierce competition in large-scale models and reaching new heights.
Recently, the intensity of internal competition among domestic large models can be described as "a fight between immortals." This Friday, the battle of large models has reached new heights. According to incomplete statistics from Wall Street News, only today, Huawei, Alibaba, TENCENT, SenseTime, JD.com, and other companies have released or updated large models.
In the midst of the "battle of a hundred models," who is most likely to create a Chinese version of GPT-4?
Huawei Cloud releases PanGu 3.0, a large-scale model
On July 7th, Huawei Cloud released PanGu 3.0 at the Developer Conference 2023. Zhang Ping'an, Executive Director of Huawei and CEO of Huawei Cloud, stated that PanGu 3.0 is a large-scale model that is fully industry-oriented, including a three-tier architecture of "5+N+X."
Zhang Ping'an stated at the conference that PanGu does not write poetry, it only gets things done. It will continue to build core competitiveness around the three innovation directions of "industry reshaping," "technology rootedness," and "open collaboration," providing better services for industry customers, partners, and developers.
The three-tier architecture consists of:
The L0 layer includes five basic large models: natural language, vision, multimodal, prediction, and scientific computing. It provides various skills to meet the needs of industry scenarios. PanGu 3.0 provides a series of serialized basic large models with 10 billion, 38 billion, 71 billion, and 100 billion parameters, matching the diverse needs of customers in different industries, time delays, and response speeds. It also provides new capabilities, including knowledge question-answering, copywriting generation, code generation for NLP large models, as well as image generation and image understanding capabilities for multimodal large models. These skills can be directly accessed by customers and partner companies. Regardless of the scale of the large model, PanGu provides a consistent set of capabilities.
The L1 layer consists of N industry-specific large models. Huawei Cloud can provide industry-specific large models trained using publicly available industry data, including government affairs, finance, manufacturing, mining, meteorology, and other large models. It can also train customers' proprietary large models based on their own data at the L0 and L1 layers of PanGu.
The L2 layer provides customers with more refined models that focus on specific industry applications or specific business scenarios, such as government hotlines, branch assistants, leading drug screening, conveyor belt foreign object detection, typhoon path prediction, etc. It provides customers with "out-of-the-box" model services.
PanGu adopts a fully layered and decoupled design, which can quickly adapt and meet the ever-changing needs of the industry. Customers can load independent datasets for their own large models, upgrade basic models separately, and upgrade capabilities separately.
In addition to the L0 and L1 large models, Huawei Cloud also provides customers with a large model industry development kit. By retraining customers' proprietary data, they can have their own exclusive industry-specific large models. Meanwhile, according to different data security and compliance requirements of customers, the Pangguo large model also provides diversified deployment forms such as public cloud, large model cloud zone, and hybrid cloud.
Alibaba AIGC Application "Tongyi Wanxiang"
At the 2023 World Artificial Intelligence Conference, Alibaba Cloud officially launched the AI painting product "Tongyi Wanxiang".
Based on the combination generation model Composer developed by Alibaba, Tongyi Wanxiang proposes a "combination generation" framework based on diffusion models. By decomposing and combining image design elements such as color, layout, and style, it provides highly controllable and highly flexible image generation effects.
Users can input prompts in Tongyi Wanxiang to output corresponding images. In addition to Wengshengtu, Tongyi Wanxiang also launched functions such as style transfer and similar image generation.
From now on, the threshold for image design will be greatly reduced, bringing about a revolution in art design, gaming, and cultural creativity.
Currently, Tongyi Wanxiang has the following three major functions: Wengshengtu, similar image generation, and style transfer.
Wengshengtu is the basic form, as long as you input a prompt and select a creative style (watercolor, oil painting, Chinese painting, flat illustration, anime, sketch, 3D cartoon, etc.), Tongyi Wanxiang can automatically generate a massive amount of creative inspiration. Tongyi Wanxiang has been officially launched and is available to the public.
Similar image generation allows users to quickly expand similar materials based on existing materials. As long as users provide a reference image, they can obtain an image that is similar in content and style.
Style transfer is used to generate a new image with a specified style based on an original image.
The following image is a test from "New Intelligence". Tongyi Wanxiang used the style of French Impressionist painter Renoir to transform the image of a woman wearing a white veil.
After the transfer, a portrait in the style of Impressionism was obtained.
According to the evaluation by "New Intelligence", Tongyi Wanxiang's ability to create some images is approaching that of the world's most powerful AI painting tool, Midjourney.
TENCENT MaaS Platform Upgrade
During the World Artificial Intelligence Conference, TENCENT Cloud announced the upgrade of its MaaS platform, applying the capabilities of industry-scale models to new scenarios such as financial risk control, simultaneous interpretation, and intelligent customer service. Among them, the newly unveiled large-scale model for financial risk control has achieved a 10-fold increase in efficiency compared to traditional risk control methods.
In the field of technical infrastructure, self-developed Xingmai high-performance computing network and vector database provide a more abundant computational infrastructure for the industry application of large-scale models. The latest upgraded Xingmai high-performance computing network by TENCENT Cloud can improve GPU utilization by 40%, saving 30% to 60% of model training costs, and achieving a 10-fold improvement in communication performance for AI large-scale models. Based on TENCENT Cloud's new generation computing cluster HCC, it can support a computing scale of up to 100,000 cards. TENCENT Cloud's AI-native vector database supports retrieval of up to 1 billion vectors, with latency controlled in milliseconds. Compared to traditional single-machine plug-in databases, the retrieval scale has increased by 10 times, while also having a peak capacity of millions of queries per second (QPS).
In terms of application innovation, TENCENT Cloud has applied the capabilities of industry-scale models to scenarios such as financial risk control, interactive translation, and intelligent customer service, greatly improving the efficiency of intelligent applications.
The financial risk control solution empowered by industry-scale models has achieved a 10-fold increase in efficiency compared to before. With TENCENT's accumulated experience in combating black and gray industries for over 20 years and thousands of real business scenarios, the overall anti-fraud effect is about 20% higher than traditional methods. Enterprises can iterate risk control capabilities based on the prompt mode, achieving a fully automated process from sample collection, model training to deployment, with modeling time reduced from 2 weeks to just 2 days. Even with limited sample accumulation, rapid construction can be completed, bypassing the "cold start" process.
In the field of simultaneous interpretation, with the support of industry-scale model technology, simultaneous interpretation no longer requires millions of training data. Good results can be achieved with "small sample" training, reducing the need for manual tuning in professional translation and ensuring translation quality in multiple vertical industries. TENCENT's simultaneous interpretation service has provided AI simultaneous interpretation for the main forum of the World Artificial Intelligence Conference for six consecutive years.
In the field of intelligent human-like digital assistants, TENCENT Cloud has launched a small-sample digital human factory this year, which can replicate a 2D digital avatar with only a small amount of data within 24 hours, greatly reducing the cost of applying intelligent human-like services for enterprises. Now, with the AI generation algorithm, the replication speed of 3D digital avatars has been greatly improved. Through generative action driving and the capabilities of industry-scale models, enterprises can obtain more "personalized, professional, and naturally realistic" digital employees, making "face-to-face" professional services possible.
SenseTime's Comprehensive Upgrade of Large-scale Models
During the World Artificial Intelligence Conference, at the "Boundless Love, Daily Innovation" AI Forum, SenseTime announced the comprehensive upgrade of the "SenseTime Daily Innovation SenseNova" large-scale model system, as well as a series of updates and achievements of large-scale model products under this system. As a natural language processing model with parameters in the hundred billion range, SenseChat 2.0, developed by SenseTime, has overcome the input length limitation of large language models. It has introduced models with different parameter scales to perfectly adapt to application requirements in various scenarios, such as mobile and cloud platforms, thereby reducing deployment costs. SenseMirage 3.0, SenseTime's self-developed generative model, has increased its model parameters from 1 billion since its initial release in April this year to the scale of 7 billion, enabling professional-level depiction of image details.
Moreover, SenseAvatar 2.0, also known as SenseTime's digital human generation platform, has improved the fluency of voice and lip movements by over 30% compared to the 1.0 version. It achieves 4K high-definition video effects and brings AI-generated images and digital singing capabilities. In addition, SenseSpace 2.0, named SenseTime's spatial reconstruction platform, has increased spatial reconstruction efficiency by 20% and rendering performance by 50%. It only takes 38 hours to complete the mapping of a 100 square kilometer scene (supported by 1200 TFLOPS/s computing power). SenseThings 2.0, SenseTime's object recognition platform, achieves millimeter-level precision in texture and material restoration of small objects, and has overcome the challenge of capturing highly reflective and mirrored objects.
In the financial sector, SenseTime has collaborated with banks, insurance companies, securities firms, and other clients to utilize digital humans for intelligent customer service, smart marketing, and other tasks. By integrating the capabilities of large language models, SenseTime provides new functions such as investment research analysis and report writing, achieving cost reduction and efficiency improvement. Furthermore, after integrating financial knowledge bases, it can provide content-based question and answer outputs based on 100% customer product descriptions and ensure timely information updates.
In the medical field, SenseTime has developed a Chinese medical language model called "Da Yi" based on massive medical knowledge and clinical data. It provides capabilities for guidance, consultation, health advice, and decision support in multiple scenarios and multi-turn conversations. In the future, it will also support comprehensive analysis of medical images, texts, structured data, and other multimodal data, continuously enhancing medical language understanding and reasoning abilities, and continuously empowering hospitals to improve diagnosis and treatment efficiency and patient services.
Progress of Other AI Companies
Local AI Unicorn, DooDoo, Releases "Sequence Monkey"
DooDoo unveiled its internal testing of the "Sequence Monkey" and AI CoPilot solutions at the World Artificial Intelligence Conference. According to reports, "Sequence Monkey" is a large language model with multimodal generation capabilities. The model's ability system, centered around language, covers six dimensions: knowledge, dialogue, mathematics, logic, reasoning, and planning. It can simultaneously support different tasks such as text generation, image generation, 3D content generation, language generation, and speech recognition. "Sequence Monkey" possesses natural language understanding, knowledge, logic, and reasoning abilities, and can engage in conversations based on these capabilities.
JD.com: Confident in the Future of Large Models and Their Applications
He Xiaodong, Vice President of JD Group and Dean of JD Exploration Research Institute, stated that they are currently training a large model with a training time of about two months. The estimated cost is several tens of millions of yuan. They have great confidence in the commercial prospects and practical applications of large models. He suggested that start-up companies entering the field of large models should find their own "moat". In the face of the current situation of "hundred-model warfare", He Xiaodong believes that pressure and competition are good for the market, as they will effectively promote industry development.