The highly anticipated Gemini model: Stronger than GPT-4, but not by much?

Alphabet-C has released the Gemini model, claiming that it can understand the world like humans and handle multiple tasks. Gemini outperformed GPT-4 in benchmark performance tests, but technology analysts believe that the difference is not significant. In MMLU testing, Gemini comprehensively defeated GPT-4 with scores of 90 and 59, respectively. The release of Gemini has attracted market attention, but whether it can surpass GPT-4 remains to be seen. This news belongs to the technology industry-related information.

Alphabet-C has finally released Gemini, its highly anticipated AI model that has been in development for several months. According to Alphabet-C, Gemini is the most powerful AI model to date and can understand the world around us just like humans, effortlessly handling code, text, audio, images, and videos. The Google DeepMind team claims that Gemini outperforms GPT-4 in 30 out of 32 benchmark performance tests.

However, several technology analysts believe that while Gemini's performance is indeed superior to existing multimodal models, the gap between Gemini and GPT-4 is not as significant as claimed. From the demonstration video released by Alphabet-C, there is little that we haven't seen in the AI hype of the past year.

If Alphabet-C's computational resources, research capabilities, and abundant data can only achieve a marginal victory over GPT-4, the bigger issue is that Gemini may represent the upper limit of what can be achieved with current human technology in building large-scale models.

Stronger than GPT-4, but not by much

Based on the demonstration video released by Alphabet-C, Gemini excels at playing "Pictionary," accurately describing the drawings made by testers on paper and even guessing what they are based on the outlines.

In another example, a tester showed Gemini a picture of an omelette cooking in a frying pan and asked through voice whether the omelette was cooked. Gemini responded with voice, saying, "It's not cooked yet because the eggs are still liquid."

It seems impressive, but does Gemini truly surpass GPT-4 in every aspect as claimed by Alphabet-C?

Not necessarily.

The MMLU benchmark test, used by Alphabet-C to evaluate AI models' performance in text and image tasks, includes reading comprehension, university-level mathematics, and multiple-choice tests in physics, economics, and social sciences. Alphabet-C CEO Chai Pi claimed that Gemini outperformed GPT-4 comprehensively in the MMLU test. For pure text questions, Gemini scored 90, while human experts scored 89. GPT-4 scored 86. For multimodal questions, Gemini scored 59, while GPT-4 scored 57.

Melanie Mitchell, an AI researcher at the Santa Fe Institute, told the media that Gemini's performance in the benchmark test is impressive and demonstrates that Gemini is indeed a highly complex AI system. However, she pointed out that she did not perceive a significant difference in actual capabilities between Gemini and GPT-4.

Mitchell also noted that Gemini performs better in language and code benchmark tests than in image and video tests:"Dolphin Research still has a long way to go in order to apply multimodal base models widely and reliably in many tasks."

Percy Liang, the director of the Stanford University Base Model Research Center, also told the media that although Gemini has good benchmark scores, it is difficult to interpret these numbers because we do not know the content of the training data.

Google DeepMind also stated that with the help of human testers, Gemini has reduced the frequency of hallucinations, become more accurate in answering questions, can provide sources when asked, and no longer fabricates answers when faced with difficult questions.

However, Alphabet-C needs to release more data in order to verify this, otherwise it is difficult to confirm at present.

Hasty Deployment

Geoffrey Hinton, the father of deep learning, told the media when he left Alphabet-C in April,

"Alphabet-C has always been very cautious in releasing AI products to the public. There are too many potential risks, and Alphabet-C does not want to ruin its reputation. Faced with seemingly unreliable or unsellable technologies, Alphabet-C has taken a cautious approach and therefore missed out on more crucial opportunities."

It is possible that Alphabet-C realized this, which is why they were in such a hurry to launch Gemini.

The most powerful full-blooded version of Gemini, Gemini Ultra, will not be available to the public for several months. Alphabet-C stated that the Ultra version will only be provided to select customers, developers, partners, and safety and responsibility experts.

Some analysts have pointed out that Alphabet-C itself may not fully understand all the new features of Gemini Ultra, nor has it developed a monetization strategy for Gemini. Considering the high cost of training and inference for AI models, Alphabet-C may take a long time to come up with a profitable strategy.

Could it be that Alphabet-C's marketing strategy led to the failure of today's product launch? Perhaps. Or maybe it is really difficult to create the most advanced generative AI models - even if you restructure the entire AI department to speed up the process, the results may not be satisfactory.