Google AI "Killer App" is coming! Here's everything we know about Gemini so far.

After assembling a team of hundreds of engineers, Google's Gemini model is gaining momentum. From its powerful multimodal capabilities to its problem-solving and planning abilities, will Google, armed with TPUs, crush OpenAI?

Google's Magnum Opus - Gemini is coming, the ultimate killer of GPT-4 is about to be unveiled.

On September 14th, according to three insiders, Google has provided an early version of Gemini to a small number of companies, indicating that Google is considering incorporating it into consumer services. At the same time, Google will also sell it to enterprises through its cloud computing services, which means that the release of Gemini is getting closer.

According to insiders, Google will release different sizes of Gemini versions, so that developers can purchase simplified versions to handle simpler tasks, and small enough versions to run on personal terminals.

In order to compete with OpenAI and accelerate the development cycle of Gemini, Google CEO Sundar Pichai took a crucial step in April this year by merging two teams with completely different cultures and codes - Google Brain and DeepMind. The original founder of DeepMind, Demis Hassabis, became the CEO.

For the newly merged team, Hassabis is obviously very confident. He said that the new team brings together two forces that are crucial to recent advances in artificial intelligence. Google founder Sergey Brin, who was blown back to the battlefield by the wind of AI, personally participated in the training of Gemini.

In the following months, Gemini gradually unveiled its mystery, and this is everything known about Gemini at present.

Gemini's Multimodal Capability

The next leap for language models may be to perform more tasks on computers, as mentioned in a previous article, Gemini's greatest advantage lies in its multimodal capability, which not only understands and generates text and code, but also understands and generates images. In comparison, ChatGPT is only a pure text model that can only understand and generate text.

In addition, an important step in creating language models with capabilities similar to ChatGPT is to use human feedback reinforcement learning to improve its performance. DeepMind's profound experience in reinforcement learning can give Gemini new capabilities.

At Google's developer conference I/O in May, Google mentioned that from the beginning, Gemini's goal was to be multimodal, efficient integrated tools, and APIs. At that time, Google's preview was: "Although it is still in the early stages, we have seen multimodal capabilities in Gemini that we have never seen in previous models, which is very impressive."

Gemini merges with AlphaGo

Google DeepMind CEO Hassabis revealed that the new Gemini model will be combined with AlphaGo and large language models.

Gemini will merge the language capabilities of AlphaGo and large models like GPT-4, greatly enhancing its problem-solving and planning abilities. Some AI experts believe that the main limitation of language models is their indirect learning through text. AlphaGo, on the other hand, overcomes this limitation. In 2016, DeepMind's AI system AlphaGo defeated world Go champion Lee Sedol with a score of 4 to 1, becoming the first robot to defeat a Go world champion.

AlphaGo is based on DeepMind's pioneering reinforcement learning technology, which allows AlphaGo to learn how to handle complex problems that require choosing the right actions by repeatedly trying and receiving feedback on its performance. Additionally, AlphaGo uses Monte Carlo tree search techniques to explore and memorize possible moves on the game board.

It will come in various sizes and functionalities

Google points out that Gemini is currently undergoing training and, once fine-tuned, it will be able to be used "in various sizes and functionalities" just like PaLM 2. Google states that it can be deployed in different products to benefit everyone.

In addition to its applications in enterprise services, Gemini has tremendous potential in medical use cases. Google has been testing an AI tool called Med-PaLM 2, which can be enhanced with Gemini's capabilities. This model can be used for medical chatbots or robotic technologies to assist in surgeries and medical procedures.

Furthermore, Google's insights in building DeepMind's Gato (a "universal" system) and the recently launched RT-2 (a robot Transformer model) can also be integrated into Gemini. The collaboration between Google Brain and DeepMind poses a significant challenge to OpenAI and other competitors in the field of artificial intelligence.

Gemini integrated into various Google applications

In an interview in September, Musk revealed information about integrating Gemini into Google products. He stated that conversational AI systems like Bard are not the final state but rather an intermediate point towards more advanced chatbots.

Musk stated that the final version combining Gemini and Bard will become an "amazing universal personal assistant" that integrates into various aspects of people's daily lives, such as travel, work, and entertainment.

He emphasized that Gemini will leverage the advantages of both text and image, claiming that current AI chatbots will seem "insignificant" within a few years.

Compared to existing models, Gemini will enhance software developers' code generation capabilities. Google aims to surpass Microsoft's GitHub Copilot code assistant with it.

TOB sales as a focus, Google Cloud catching up with Microsoft Azure

Google hopes to attract more users, especially in the cloud computing business, by leveraging Gemini for its products.

Google plans to offer the Gemini model to enterprises through its Google Cloud's Vertex AI service and will release versions with different parameters, indirectly promoting its cloud services business. This year in May, Google announced that it will provide Google Cloud customers with a set of Palm 2's LLM through Vertex AI. Recently, Google also offered customers a free one-month trial of Google's large models through the coding platform startup Replit.