A Review of OpenAI's Series of Press Conferences: From Tools to AGI, OpenAI's 12-Day Evolution Theory
In the first four days, the official version of o1, Sora, and Canvas were launched with great fanfare. After a few days of "water," on the last day, they directly released the game-changer—GPT-o3, which broke the skepticism that AI development had hit a bottleneck
OpenAI's 12 consecutive Devday updates at the end of the year have finally come to an end. Watching the release conferences every day felt like opening a chocolate blind box, not knowing what the next flavor would be.
In the first 11 days of the conferences, most of the updates were quite bland, with only three products offering some exciting "flavors."
In summary, the significant updates include: o1 official version, Sora, Canvas, which were mainly released in the first four days.
Among them, the o1 official version has indeed seen significant improvements, Sora has added several product modes for modifying AI-generated videos, and Canvas can be seen as OpenAI's first attempt to challenge the AI workstation.
Additionally, there were some noteworthy aspects: deep collaboration with Apple, video calling functionality, and enhanced fine-tuning of o1-mini.
The enhanced fine-tuning of o1-mini has great potential in professional fields, with simple adjustments leading to noticeable improvements. The video calling feature is the stunning "HER" officially launched. The deep collaboration with Apple is also a significant event for OpenAI, further solidifying its position as the leader in the AI industry.
Some smaller product updates made one wonder—"Is this even worth a release conference?"
These products include the "Projects" feature, the official opening of o1 image input and 4o advanced voice API, the upgrade of ChatGPT Search, and the ability to call GPT. They are relatively minor updates and do not differ much from competitors.
On the final day, OpenAI finally dropped a bombshell: GPT-o3. This broke the skepticism surrounding the bottleneck in AI development, with various performances heading straight towards AGI.
We have created a table based on the importance of the released products, summarizing these rollercoaster-like twelve days of announcements.
Next, let's delve into the core points of these updates.
Important Product Updates
o1 Complete Version (Day 1)
In terms of capability, o1 has indeed made significant progress compared to the Preview version. It has improved by 50% in the International Mathematical Olympiad Preliminary Round questions (AIME 2024) and programming ability tests (CodeForces) compared to o1-preview. The major error rate in handling complex problems has decreased by 34%.
It can also adjust processing time based on the difficulty of the questions, reducing user wait times by over 50%.
More importantly, o1 now supports multimodal recognition. This greatly enhances its practicality. Doctors can use it to analyze medical images, engineers can have it assist with blueprints, and designers can seek creative suggestions from itBut its price is quite expensive; only subscribers of the $200 ChatGPT Pro version can enjoy unlimited use, while regular $20 subscribers can only enjoy 20 uses per day.
As a product debuting on the first day, o1 indeed stands out.
Sora (Day 3)
After waiting for 10 months, Sora has finally arrived.
However, this is not a model version upgrade, but more like a product refinement. The official version of Sora can generate videos up to 20 seconds long and at a maximum resolution of 1080p. The generation effect is not much different from what was released in February.
But OpenAI has indeed put some thought into the product; the storyboard is the most innovative feature of this release and also Sora's most ambitious attempt. It provides users with a timeline interface similar to professional video editing software. Users can add multiple scene cards on the timeline. Users can string together multiple prompts, and the system will automatically handle the transition effects between scenes.
In addition, OpenAI also offers three professional tools: Remix, Blend, and Loop. Users can replace elements in the video, mix two videos, and even automatically complete to create infinite loop videos.
The product is quite good, but the unupgraded model is not very powerful. In post-release evaluations, Sora frequently encountered issues, with motion, interaction, and physics often handled poorly. There were also instances of people and ghostly figures appearing out of nowhere.
OpenAI's available usage is also quite stingy; Plus users at $20 can use it 50 times per month. Only Pro users paying $200 per month can enjoy unlimited "slow" generation privileges.
Sora has finally arrived, but it is quite disappointing.
Canvas (Day 4)
In a nutshell, Canvas is OpenAI's AI version of Google Docs.
Because Canvas has evolved into a complete workstation that integrates intelligent writing, code collaboration, and AI agents. It shows OpenAI's ambition to go beyond just chatbots.
As a writing assistant, it can provide editing suggestions
In terms of programming functionality, Canvas has created an almost latency-free programming environment through its built-in WebAssembly Python simulator. It also demonstrates the ability to understand code intentions.
Like the recently updated Cursor and Devin, it has launched the capability of customized AI agents. It can perform a series of operations to help you send Christmas letters to your friends.
These three dimensions of Canvas do not operate in isolation. In practical use, they often work together, and this seamless integration makes Canvas a multifunctional AI-driven creative studio prototype.
However, from the perspective of front-end display, it is not as good as Claude's Artifacts. The convenience of programming is also not as good as Cursor. Therefore, integration is its highlight.
General Product Updates
o1-mini Enhanced Fine-tuning (Day 2)
If this product were not so narrowly practical, it would be considered a major release.
It changes the past logic of fine-tuning, which was merely through the addition of specialized data, to enhance learning direction fine-tuning for models with reasoning capabilities. It guides the model to have deeper thinking abilities when facing complex problems.
Now, with just "dozens of examples" or even 12 examples, the model can effectively learn reasoning in specific fields. According to OpenAI's research data, the test pass rate of the enhanced fine-tuned o1-mini model is 24% higher than that of the traditional o1 model, and it is a full 82% higher compared to the non-enhanced fine-tuned o1-mini.
Unfortunately, it can only fine-tune o1-mini, and its applicability is limited to complex domain tasks such as healthcare, law, or finance and insurance. Its versatility is relatively poor.
Advanced Video Voice Mode (Day 6)
This is another old feature being reintroduced. On May 13, during the demonstration of GPT-4o, OpenAI staff were able to video call with 4o, allowing us to see the content of our mobile screens in real-time or chat with us based on the real-time images from the camera.
This time it has been truly implemented, with no upgrades. However, this feature itself is still very important.
However, because this feature has been in development for a while, Microsoft's recently launched Vision and Google's still-developing Astra have already caught up. OpenAI's lead is gradually being eroded
Cooperation with Apple (Day 5, Day 11)
The collaboration between ChatGPT and Apple Intelligence feels more like an official announcement of deep results. What Apple can't handle can only be left to OpenAI.
The integration mainly includes three aspects: first is the collaboration with Siri. When Siri determines that a certain task may require ChatGPT's assistance, it can hand over the task to ChatGPT for processing;
Secondly, there is an enhancement of writing tools, allowing users to use ChatGPT to write documents from scratch, as well as refine and summarize documents;
The third aspect is the camera control feature of the iPhone 16, which enables users to gain deeper insights into the subjects they are photographing through visual intelligence.
The later integration on the eleventh day for Mac gave GPT more access to Mac tools.
What I don't understand is why these two couldn't be announced on the same day and had to be split into two days?
Capability Complements and Minor Feature Updates (Day 7, 8, 9, 10)
The remaining updates can at most be considered filler. It can be summed up in a simple sentence.
“Projects” feature: It allows users to create specific projects, upload relevant files, set custom instructions, and consolidate all conversations related to that project in one place. Basically no different from Claude's.
ChatGPT search upgrade: It can search within conversations and supports multimodal output. Perplexity's Pro mode has supported this for a while.
4o Hotspot: U.S. users can now make calls using 4o! Quite respectful to the elderly, I see it as a way to celebrate the Double Ninth Festival for them.
o1 image input and 4o advanced voice API officially opened: I suggest mentioning this as the last sentence on the day of the o1 release.
These days have really felt like entering a loop of dragging time.
Final Showdown
GPT-o3 (Day 12)
If it weren't for the grand finale of GPT-o3 on the last day, I would really think OpenAI was just stirring the waters by holding a 12-day conference.
Because during this period, Google released Gemini 2 Flash, super fast and powerful; Astra, which really looks like an Agent; Voe2, crushing Sora; Gemini 2 Flash Thinking, they also have o1. They just released three announcements and a few videos, flipping the table on all of OpenAI's previous 11 days of releases.
But on Day 12, OpenAI regained its momentum. With o3, it proved to the industry: Scaling Law is not dead, OpenAI reigns supreme.
o3 is the next version of o1. Just three months after the release of o1 in September, this new version has significantly surpassed OpenAI's previous o1 model in coding, mathematics, and ARC-AGI benchmark testsLet's look at some data comparisons:
Codeforces Rating: 2727 — equivalent to ranking 175th among human programmers in global coding competitions. Exceeds 99% of human programmers.
PhD-level scientific questions (GPQA): 87.7% — PhD students generally score around 70%.
The hardest frontier mathematics test: 25.2% — other models did not exceed 2%, and mathematical genius Terence Tao said this test "could stump AI for years."
Proof of whether AGI has been reached (ARC-AGI): 87.5% — o1 scored 25%.
The most noteworthy is the last test, ARC-AGI, which demonstrates the model's ability to adapt to new tasks. In comparison, the previous ARC-AGI-1 improved from 0% in GPT-3 in 2020 to only 5% in GPT-4o in 2024. This indicates that the model is not just memorizing but is genuinely capable of solving problems.
Although it performed excellently in the ARC-AGI test, this does not mean o3 has reached AGI level, as it still fails in some very simple tasks, showing a fundamental difference from human intelligence.
Nevertheless, this proves that OpenAI's choice to enhance reasoning as a paradigm shift has been successful. There are no signs of slowing down in the development of artificial intelligence. The Scaling Law remains effective.
Concerns about AI stagnation have been swept away by OpenAI's year-end Christmas gift.
Although the cost of a low-computational calculation for o3 can reach up to $20, and high-computational costs may even reach $3000, using it at this stage is nearly impossible. However, computational power will decrease, and the Scaling Law will continue.
In three months, with two top models, OpenAI has once again made us feel the rapid pace of AI from the end of 2022 to the beginning of 2023, from ChatGPT to GPT-4, in these last 12 days.
Perhaps, as Noam Brown, a scientist at OpenAI who previously participated in the development of o1, said in an interview, "In 2024, OpenAI is experimenting, and 2025 will be the year of full-speed advancement."
OpenAI's 12-day conference had its ups and downs but ended perfectly, laying hope for AI in 2025.
Author of this article: Hao Boyang, source: Tencent Technologyreinfo=fa88e30fc6863fc37715c5632933fcc7&sharer_shareinfo_first=580aec238b17e8c0d7ab05ce5c9d8611#rd), Original title: "A Review of the OpenAI Series of Press Conferences: From Tools to AGI, OpenAI's 12-Day Evolution Theory"