AI Weekly News: Microsoft launches AI transformation with Windows 11; NVIDIA reduces office software costs by 23%; Video training becomes a crucial learning path for robots

1. "Copilot" settles in Windows 11, marking the moment of AI transformation for the operating system. 2. Kingsoft Office introduces NVIDIA inference servers and GPUs, reducing the cost of image tasks by 23%. 3. Midjourney 5.2 receives another update, this time bringing AI-generated memes into the mix. 4. With the Whisper model, anyone can become a "video editor" in just ten seconds. 5. Robots learn household chores by watching YouTube, making large-scale video training a crucial pathway. 6. Unity launches the AI Hub platform, resulting in a 15% surge in stock price, accelerating the progress of the AI revolution in gaming.

Insights from an AI Perspective

This week, there has been significant market potential demonstrated in the areas of the Windows operating system, office software applications, and private deployment of large models. The transformation of Microsoft's operating system will bring users a more intelligent and efficient experience. The AI technology in office software applications will enhance image processing capabilities. The development of private deployment of large models will make AI more secure and controllable in B2B applications. These trends will have a significant impact on the development of related industries and companies, providing more opportunities for the widespread application of AI.

Key Highlights of the Week

"Copilot" settles in Windows 11, marking the moment of AI transformation for the operating system.

Kingsoft Office introduces NVIDIA inference servers and GPUs, reducing the cost of image tasks by 23%.

Midjourney 5.2 receives another update, this time with AI-generated memes.

Ten-second video editing made possible with the Whisper model, enabling everyone to become a "video editor."

Robots learn household chores by watching YouTube, with large-scale video training becoming an important pathway.

Unity launches the AI Hub platform, resulting in a 15% surge in stock price, accelerating the progress of the AI revolution in gaming.

Google DeepMind invests millions of dollars in creating a competitor to ChatGPT, with multimodal video training becoming a distinctive feature.

PrivateGPT open-source model enables offline querying, offering great potential for local deployment.

Hundsun releases LightGPT for the financial industry, catering to diverse financial model scenarios with strong demand certainty.

Zhongke AI takes a step further in large model applications, becoming the foundation for specialized legal models.

AI security becomes the next focus of capital, with CalypsoAI raising $23 million in funding.

AI Applications

Microsoft has officially announced the early preview of Windows Copilot to developers in the Windows Insiders channel.

The first preview version focuses on integrated UI experience. Windows Copilot will appear as a sidebar docked on the right side, avoiding overlap with desktop content. It will run alongside open application windows, allowing interaction with Windows Copilot at any time.

In the preview version of Copilot for Windows 11, users can ask the following types of questions:

Topics covered include personalized system settings, screenshots, and the functionality of text and image generation. In addition, the Win11 update also includes native support for reading other archive file formats, such as common .rar and .7z compressed files.

Insightful Comment:

Microsoft has fulfilled its promise made at the Build conference in May. The preview version of Copilot has landed on Win11, marking an important step for the operating system to enter the AI era. In the future, Win11 will become the first commercially available version of an AI system on a large scale. Additionally, Microsoft has announced that it will end support for Win10, including the Professional and Home editions, in 25 years. This means that the operating system will fully enter the AI era at that time. This will provide users with a more intelligent and efficient operating experience and bring new business opportunities for Microsoft.

Kingsoft Office has announced its collaboration with the NVIDIA team to address the issues of time-consuming and costly image recognition and understanding tasks. They have introduced the NVIDIA T4 Tensor Core GPU for inference, NVIDIA TensorRT 8.2.4 for model acceleration, and NVIDIA Triton Inference Server 22.04 for model deployment and orchestration on K8S.

Through GPU inference and TensorRT acceleration, the processing time has been reduced from 15 seconds to approximately 2.4 seconds, resulting in a 23% cost savings in deployment.

Insightful Comment:

The deployment of NVIDIA inference servers has successfully optimized GPU utilization and improved the efficiency of office software in image document recognition and inference. This collaboration provides a more efficient solution for the implementation of WPS AI, with a focus on reading comprehension, question answering, and human-computer interaction.

WPS AI has entered the internal testing phase, and once it is commercially available on a large scale, it will bring a comprehensive upgrade to the user experience of domestic office software. This collaboration will enhance Kingsoft Office's image processing capabilities and give it a competitive advantage in the office software market.

Midjourney 5.2 update introduces a new "weird" feature that allows users to customize the level of weirdness. According to the official website, adjusting the size of the weird parameter controls the strange style of the generated photos, with larger parameters resulting in stranger photos. However, this feature is currently only available to paid users.

In addition, a "turbo" mode has been added, which can accelerate image generation by four times the speed. However, compared to the traditional fast operation mode, it still requires twice the GPU consumption.

Review by Jianzhi:

The generation of images under the influence of AI is transitioning from a traditional mode to a new paradigm. The weird mode increases the possibility of creating images that go beyond conventional cognition. This mode has more entertainment value and is likely to explode in social circles if it is freely available. This update will further promote the automation and intelligence of creating reaction images, providing users with more creative and entertainment options.

Dutch developer Matthijs Hollemans has developed a new video editing feature based on Whisper on HuggingFace. Now video editing can be done down to each word with precision.

On the platform, uploaded video content can be synchronized with text conversion. Just select the desired text and the required segments can be generated directly. The process is very simple, comparable to a "point-and-shoot camera" in the camera industry.

Review by Jianzhi:

AI applications are emerging one after another. Previously, AI-generated images were highly repetitive, with important updates almost every week. Now this iteration speed has begun to spread to the field of videos. This beginner-friendly video editing feature greatly reduces the threshold for video editing and saves a lot of production time. The efficiency optimization exceeds 90%, which has a significant impact on the video creation industry. This will further promote users' creative and sharing activities on social media, and commercial opportunities will also increase for the developers of the Whisper model.

Deepak Pathak, an assistant professor at CMU Robotics Institute, demonstrated a method called Visual Robot Bridging (VRB), which uses videos of human behavior to simulate and validate the behavior of robots. After watching videos of humans opening drawers, the robot can imitate human behavior and open the drawer.

Review by Jianzhi:

The key to this method is to use a large amount of video data to train robots and learn human behavior and operations from it. This provides robots with a wider range of possibilities and allows them to obtain more training data by observing videos on the Internet and YouTube. This method can improve the operational capabilities of robots and provide more opportunities for their application in daily life.

Video training will become an important path for robot learning, further promoting the application and development of robot technology in the home and service fields.

Leading global 3D content platform Unity announced the launch of the AI Hub platform, which allows AI software developers to directly supply development software to game developers through AI Hub and charge fees through Unity's Asset Store.

At the same time, 10 verified solutions have been launched, and two new AI products, "Unity Sentis" and "Unity Muse," have been officially launched in the Unity Asset Store for internal testing. It is expected to enhance the efficiency of AI-driven game development and upgrade gameplay. Review:

AI will change the way game production works and reduce costs at a very rapid pace, which has become an irreversible path for the game industry. The landscape of the game industry may also undergo tremendous changes due to generative AI, and the AI revolution in the game industry is accelerating.

The launch of the new AI platform confirms the significant demand from developers for AI tools, and Unity seems to have found a new revenue model. The AI Hub platform will soon become the most popular resource aggregation platform for developers and game companies, and it appears that Unity will profit greatly from this new transformation.

Large-scale Models

Google DeepMind CEO Hassabis recently told Wired that Gemini is still under development and will take a few more months. Google DeepMind is prepared to invest tens of millions, or even hundreds of millions, of dollars.

According to a recent report by The Information, Google researchers have been using YouTube to develop Gemini.

AI practitioners believe that this may be an advantage for Google DeepMind because it can "access video data more comprehensively than its competitors who only scrape videos."

Review:

Google may have been stimulated by the significant mistakes made during the previous Bard chatbot launch, and it now has high hopes for Gemini. They hope that Gemini will not only compete with ChatGPT but also surpass it. Therefore, when training the next generation of chatbots, they focus more on multimodal training, especially feeding video data. This is something that ChatGPT cannot currently achieve, and it is the differentiation that Google desires in the market competition for large-scale models. The battle among major companies for large-scale models is still ongoing.

Developer Iván Martínez Toro has released the open-source model PrivateGPT, which allows users to ask questions to the model without an internet connection by providing their own documents.

PrivateGPT can be run on home devices and requires the download of an open-source large language model (LLM) called "gpt4all." Users need to place all relevant files in one directory for the model to import all the data. After training the LLM, users can ask the model any questions, and it will use the provided documents as context to answer. PrivateGPT can handle over 58,000 words and currently requires a significant amount of local computing resources (high-end CPUs are recommended) for setup.

Toro stated that PrivateGPT is currently in the proof-of-concept (PoC) stage, and it has at least demonstrated the ability to create large-scale models similar to ChatGPT locally. It can be foreseen that once this PoC becomes an actual product, PrivateGPT will have the potential to provide companies with personalized, secure, and private ChatGPT to enhance productivity. Review:

PrivateGPT's emphasis on localized deployment is crucial for industries and individuals concerned about data privacy and security. With localized deployment, users have better control and protection over their data, reducing the risks of data leaks and privacy breaches. Open-source models and localized deployment will have a positive impact in the future.

LightGPT provides underlying AI services for various financial business scenarios such as investment consulting, customer service, investment research, operations, risk management, compliance, and development. It supports over 80+ financial-specific task instructions for fine-tuning. It possesses capabilities in financial professional Q&A, logical reasoning, long-text processing, multimodal interaction, and code handling.

The model utilizes over 400 billion tokens of financial domain data (including news, announcements, research reports, structured data, etc.) and over 40 billion tokens of language reinforcement data (including financial textbooks, financial encyclopedias, government reports, regulations, etc.) as secondary pre-training corpora for the large model.

LightGPT offers a more versatile and lightweight deployment, supporting private/cloud deployment and flexible API calls. Inference only requires deployment on a single machine with 2 GPUs.

Review:

The demand for large financial models is driven by both data security and diversified application scenarios.

Previously, Bloomberg launched the financial industry's large language model, BloombergGPT. Tencent Cloud is also collaborating with Shenzhou Information on the development of large financial models. We have previously provided dynamic reviews on the development of large financial models, and many securities firms, banks, and fund companies are investing in AI research and launching AI products. Based on Hundsun's years of IT service experience in the financial industry, the release of LightGPT is of great significance, and we will continue to follow up on the feedback after the open testing of LightGPT.

PowerLaw AI and Zhizhi AI have released PowerLawGLM, a legal vertical large model based on a Chinese trillion-token model. It focuses on the legal field and has unique advantages in applying to Chinese legal scenarios, with rich legal knowledge and understanding of legal language.

Based on the capabilities of the PowerLawGLM large model, PowerLaw has also developed a legal dialogue product called ChatMe, which has officially launched and has opened 50 beta testing slots.

PowerLawGLM is jointly developed based on Zhizhi's ChatGLM 130B general trillion-dialogue large model. After multiple rounds of high-quality legal text data cleaning (including judgments, laws and regulations, legal Q&A, etc.) and incremental training of the model, the legal version of the base model, LawGLM 130B, is obtained. In the evaluation of 100 questions, PowerLawGLM achieves around 70% optimal answers.

Review:

If large models are directly applied in the legal field, there will be a significant mismatch between the output results and the requirements. This is because the data results of large models are generated based on training data, but the legal rules in different countries are completely different, with strong specialization and regional limitations. Therefore, it is difficult to obtain satisfactory content using globally applicable large models. The legal vertical model PowerLawGLM of the Chinese billion-scale model is able to adapt well to the case situations in our country. The Chinese legal model is just the beginning, and many industries will gradually launch professional domain models in the future, while also developing AI dialogue products based on vertical models. As the underlying model, Zhiju AI plays an important role in developing professional domain models. Previously, a smaller capacity ChatGLM-6B model was also open-sourced, which is particularly suitable for learning and lightweight development. In addition, the balance between training stability and efficiency in large-scale model training is worth paying attention to.

AI Financing

CalypsoAI's products can be compared to 360 Security Guard, mainly releasing products that provide security barriers using large language models such as ChatGPT, including features such as malicious code detection and jailbreak prevention. It aims to accelerate the scenario-based implementation of AI products such as ChatGPT in industries such as finance, healthcare, and law, while addressing challenges such as data privacy, security protection, and the generation of illegal information.

CalypsoAI announced on its official website that it has raised $23 million (approximately 160 million yuan) in Series A-1 financing. This round was led by Paladin Capital Group, with follow-on investments from Lockheed Martin Ventures, Hakluyt Capital, and others.

Zhijudianping (Zhiju Review):

The focus of capital attention is no longer limited to large models and AI applications; now they have started to invest in AI security products. After all, as an underlying tool, once a large model is contaminated or attacked, the output content will deviate completely from expectations, which undoubtedly wastes computing resources and may also lead to security issues such as data leakage. Therefore, AI security products will inevitably become an important field of future market demand.

Next week's focus: Artificial Intelligence Conference

AI Weekly News: Microsoft launches AI transformation with Windows 11; NVIDIA reduces office software costs by 23%; Video training becomes a crucial learning path for robots | Insight Research

Insights from an AI Perspective

Key Highlights of the Week

AI Applications

Large-scale Models

AI Financing