A Comprehensive Understanding of the GPT-5 Launch Event | Price Slasher, Impressive Programming, New Features Lackluster

Wallstreetcn
2025.08.08 00:20
portai
I'm PortAI, I can summarize articles.

At the press conference on August 8, OpenAI launched GPT-5. Although it seems less impressive compared to previous generations, there are still improvements, such as a very low hallucination rate and enhanced contextual capabilities. The API price of GPT-5 is only 1/15 of Claude Opus 4.1, making it competitive. GPT-5 is divided into several versions, including GPT-5, GPT-5 mini, GPT-5 nano, and GPT-5 Pro, the latter providing stronger computing power and parallel computing capabilities for enterprise users

At the press conference held at 1 AM Beijing time on August 8, GPT-5 finally made its long-awaited debut, two and a half years after the release of GPT-4.

However, this time, compared to the stunning appearance of ChatGPT, the leap in capabilities of GPT-4, and the shock of the o1 release, this press conference felt particularly bland: unimpressive benchmarks, no sign of a new paradigm; use case demonstrations that hardly sparked interest or highlighted differences from competitors; and even a PPT display error caught by netizens, all contributed to this 1 hour and 20 minutes press conference.

But this does not mean that GPT-5 has not made progress. Extremely low hallucination rates, enhanced front-end capabilities, a leap in contextual abilities, and highly competitive pricing are all rare highlights of this release.

Especially the pricing, under GPT-5's impressive programming performance, its API price is only 1/15 of the recently released Claude Opus 4.1, and it is also lower than Gemini 2.5 Pro.

This can be seen as a fatal blow to Anthropic.

This morning, OpenAI, although it has lost its magical rhythm, still stands firm in the fierce competition with other vendors.

GPT-5 Body: Limited Upgrades, Minor SOTA

This time, GPT-5 has a total of 4 versions: GPT-5, GPT-5 mini, GPT-5 nano, and the GPT-5 Pro mode, which is only available to enterprise versions and the $200 per month premium version.

For general users, the default is the unified model GPT-5, which is a system composed of multiple models, including the "smart and fast" model (GPT-5-main) for most questions and the "deeper reasoning" model (GPT-5-thinking) for more complex issues.

This unified implementation is determined by a real-time router that decides which model to use for specific queries.

The specific choices for mini and nano are selectable by API users. The GPT-5 Pro mode is similar to the Grok 4 Hard mode, using parallel testing computation, allowing multiple models to compute together for a longer time. It provides the most comprehensive and accurate answers with greater computing power. It has set a world record on ultra-high difficulty scientific questions (GPQA). In "blind tests" against human experts, it was deemed superior nearly 7 out of 10 times.

In terms of capabilities and scoring, GPT-5 has improved in almost every aspect, but only slightly higher than the current SOTA and only marginally stronger than o3.

Intelligence Level: Best Experience, but Not the Best Intelligence

In terms of intelligence level, GPT-5 surpasses o3 in various mainstream evaluation sets, but the overall gap is not very large.

Breaking it down, we can see that in the cutting-edge math test set, the performance of GPT-5 is not as good as ChatGPT Agent, only showing stronger results in Pro mode.

When compared with other models, we find that GPT-5's "intelligence" capabilities are only slightly higher than its competitors, with some abilities not even being SOTA, indicating an overall slight lead. It is hard to say that there is a leap in capability improvement.

Overall, according to the ranking by Artificial Analysis, GPT-5 currently leads at first place, but its overall score is only two points higher than o3 and just one point higher than Grok 4.

Another indication that GPT-5's capabilities are below expectations is the Arc Prize test, where GPT-5 was outperformed by Grok 4, and by a significant margin.

However, we can also see that compared to o3, GPT-5 has indeed improved in computational efficiency, achieving better results than o3 with fewer token consumption, and its efficiency also surpasses that of Anthropic's models.

According to OpenAI's introduction, GPT-5 thinking can reduce the number of tokens used by 50%-80% when solving complex problems.

This has even excited Musk enough to tweet about it.

Recently, Grok has been making waves in the AI chess competition, and this time it has outperformed OpenAI, making it feel like Grok has benefited the most from this release event However, in terms of user experience, GPT-5 has made a comeback.

On the LMArena leaderboard, which primarily compares different models through user blind tests, GPT-5 ranked first in all categories.

Programming: Solving Pain Points, Making Vibe Coding "Worry-Free" with Agent

In the programming field, which OpenAI emphasized this time, GPT-5 has shown a noticeable improvement in the thinking mode compared to its predecessor.

However, if we also consider the latest Claude 4.1 Opus from its programming-focused competitor Anthropic, the advantage becomes extremely small, with only a 0.3% difference between the two.

Although GPT-5's overall performance in programming benchmarks is not particularly outstanding, OpenAI has indeed made many optimizations in the actual programming experience. During the launch event, OpenAI introduced several important enhancements in programming, mainly reflected in understanding programming requirements, correcting errors, and utilizing more tools.

This is mainly attributed to the maturity of the Agentic Coding system. GPT-5 excels at handling "agentic" coding tasks, capable of invoking multiple tools and working continuously for several minutes or even longer to complete a complex instruction.

The model actively communicates during coding, explaining its plans, steps, and findings, acting like a collaborative team.

To achieve this partner-like behavior, OpenAI's team specifically fine-tuned the model for several features, enhancing its capabilities in autonomy, collaboration and communication, and testing.

The improvements in understanding programming requirements and following instructions allow GPT-5 to transform vague or detailed instructions into usable code, helping even those who do not understand programming to realize their ideas.

Some users on Twitter have also provided corresponding feedback.

The tool invocation capability, after special fine-tuning by OpenAI, has also been highlighted.

This is particularly evident in the Tau test set. It is used to evaluate an AI model's ability to engage in dynamic conversations with users in simulated real-world scenarios and effectively use external tools (i.e., API or function calls) to complete tasks. In the telecommunications field, its capability has significantly improved.

Another very important update is the significant enhancement of the "bug fixing" capability.

In the demonstration, GPT-5 was able to delve into a real codebase (OpenAI Python SDK), understanding the structure and logic of the code by searching and reading files, ultimately pinpointing the root cause of issues. It could even comprehend the deeper reasons behind certain architectural decisions made by human engineers, such as enhancing security.

Moreover, it can automatically fix its own bugs. During a demonstration of a frontend application development task, after writing the code, GPT-5 attempted to build the project itself. When errors occurred during the build process, it could provide feedback on these error messages to itself and then modify and iterate its code based on these errors. This was described by OpenAI's demonstrators as a "profound moment" and a "self-improvement loop."

In the process of fixing specific bugs, the model also demonstrated high intelligence. For example, while running code checks (lints), it discovered other issues but could determine that these problems were unrelated to the current bug being fixed, thus avoiding unnecessary modifications.

This is particularly important for current vibe coding. A paper published this year mentioned a counterintuitive fact: AI-assisted programming might actually reduce work efficiency rather than enhance it. The main reason for this is that everyday programmers often face not a brand new project but rather the need to iterate on a pile of old code.

Therefore, without a comprehensive grasp of complex programs and self-bug-fixing capabilities, AI programming would be significantly limited in such projects.

It is evident that OpenAI has truly applied the mindset of a product manager to programming, making substantial adjustments and upgrades to address pain points. In tests conducted by Wharton School professor Ethan Mollick, he also experienced the "worry-free" characteristics of GPT-5 programming.

(Ethan Mollick's blog post)

Another improvement of GPT-5 in programming is its front-end capabilities. During a live demonstration, OpenAI researchers had GPT-5 generate a series of content on-site, including a dynamic display of aircraft aerodynamics.

This content consisted of 400 lines of code, which GPT-5 wrote in 2 minutes.

There was also a snake game for teaching French, which performed quite well overall.

Multimodal: Still a Weakness

In terms of multimodal capabilities, which were widely believed to be significantly improved based on various leaks, the enhancement in GPT-5 is not very significant.

Moreover, unlike unified models like Gemini, GPT-5 remains primarily a model capable of text and image understanding. Currently, it still does not support audio input/output or image generation, let alone video.

It seems too difficult for OpenAI to catch up with the recently released Genie 3 in the short term.

Some Surprises: Ultra-Low Hallucination, Contextual Leap

Although its overall strength is not stunning, it can only be said to barely maintain the top position.

However, in some minor aspects, the improvements in GPT-5 are indeed noteworthy. And these small aspects may play a decisive role.

First is hallucination and safety. GPT-5 significantly reduces the occurrence of hallucinations, with a probability of factual errors about 45% lower than GPT-4o and about 80% lower than OpenAI o3.

This is quite an impressive achievement. With a hallucination rate of less than 1%, this is extremely important for practical applications, as hallucinations can be fatal in industrial and real work environments.

So, it's no wonder that OpenAI's core researcher Noam Brown dedicated the only comment regarding the press conference to GPT-5's progress in eliminating hallucinations.

GPT-5 briefly mentioned the general methods they used in the System Card.

On one hand, they strengthened the training of the model to effectively use browsing tools to obtain the latest information. On the other hand, when the model does not use browsing tools and relies on its own internal knowledge, the focus of training is to reduce the hallucinations that occur in such cases.

The underlying reason may be the reinforcement learning training that GPT-5 has undergone. In these trainings, OpenAI seems to have utilized some of the latest training methods, allowing these models to "refine their thinking processes, try different strategies, and recognize their mistakes."

It is precisely because of this training model that the "deceptive" behavior of the GPT-5 model has also significantly decreased, with reductions of nearly 90% in some dimensions. (Deception here refers to the model potentially misleading users about its behavior or quietly not executing tasks when it cannot complete them or lacks sufficient information. This is also directly related to the decrease in hallucinations.)

Another very important advancement is the contextual capability.

First, all versions of GPT-5 currently support a context length expanded to 400k, far exceeding the 128k default context of o3 and 4o. Although it cannot match Gemini's 1M context length, it is already a step ahead compared to other competitors.

Moreover, testing shows that the accuracy of context has seen a remarkable leap. In the needle-in-a-haystack test, GPT-5's accuracy improved by nearly double compared to o3. This means that GPT-5's ability to handle long texts will be significantly enhanced. This has a considerable impact on programming, writing, and analysis tasks that require handling complex tasks.

These two particularly small points, while not enhancing GPT-5's overall intelligence, may provide GPT-5 with a moat-like excellent experience.

New Features: Lackluster

If we can still find some highlights in programming and hallucinations, the new features of GPT-5 can basically be described as bland.

First is the optimization in writing. OpenAI demonstrated that compared to previous models, GPT-5 has significantly improved writing quality, better assisting users in polishing drafts, emails, and even stories.

Most importantly, GPT-5 feels more human and less AI-like. Its generated responses have more rhythm and cadence, with language that feels more sincere and capable of evoking emotional resonance. Moreover, due to the overall enhancement in capability, it can better understand the subtle nuances of context, making responses feel less like AI

However, during the demonstration, this was actually not very intuitive. Just like a few days ago when Ultraman showcased the movies recommended by GPT-5, everyone couldn't really see what the major differences were compared to GPT-4.

Next is the voice function. GPT-5's voice sounds extremely natural, as if having a conversation with a real person. It also introduced video input functionality, allowing the voice assistant to see what you see. This is basically standard, but the on-site experience of Grok 4's ultra-fast voice response speed was even more impressive.

Memory capability upgrade. Although OpenAI mentioned a significant enhancement to the memory function during the press conference, the actual demonstration only showcased the integration with Gmail and Google Calendar, allowing ChatGPT to access users' emails and calendars to help plan schedules. This is also a level of future standard, but it is not strongly related to "memory."

Finally, the personalization feature: GPT-5 now allows users to customize the color of the chat interface. It inevitably makes one think that when a cutting-edge technology company starts to focus on these gimmicks, it can only mean that it really has nothing else to showcase.

Responding to Data Bottleneck Doubts: Left Foot Stepping on Right Foot, Yet Effective

In a previous leak from Information, one of the main reasons for the development stall of GPT-5 was the data bottleneck. In response, OpenAI provided an explanation at the press conference.

They revealed that during the training of GPT-5, OpenAI experimented with new training techniques that allowed the model to utilize data generated by the previous generation model. Unlike fill-in data, OpenAI focused on generating "the correct type of data," aimed at "teaching" the model. They created a high-quality "synthetic process" using their model to generate complex data to teach GPT-5.

Moreover, this interaction between models across generations indicates a recursive improvement loop, where the previous generation model increasingly helps improve and generate training data for the next generation model.

This means that when the O1 model was launched, the speculation that high-quality data generated by the inference model would make the pre-training model stronger, and then reinforce the next generation inference model through reinforcement learning in a "left foot stepping on right foot" training method, has been confirmed by OpenAI However, in terms of effectiveness, this method is clearly not as effective in scaling. The data dilemma has not yet been fully resolved.

Price: The Absolute Killer

If the performance improvement of GPT-5 is unsatisfactory, at least in terms of price, it has achieved a win-win situation.

First for end users, free users can also use GPT-5, but there is a limit on usage, which is quite generous, allowing for "several hours" of chat each day. Once the limit is reached, it will automatically switch to the GPT-5 mini model. Plus users will have a "much higher" usage quota than free users, basically meeting daily application needs.

For API users, GPT-5 has offered an almost irresistible price. $1.25 per million tokens for input and $10 for output.

This price is even cheaper than GPT-4o. It is even less expensive than Gemini 2.5 Pro, which has always been known for its "low price." The prices for mini and nano are also lower than those of major competitors' models at the same level.

If GPT-5's programming capabilities are indeed as powerful as tested, it will be a devastating blow to Anthropic, which is priced 15 times higher.

However, who would have thought that a company that has always defined itself by technological leadership would actually start a price war? This is also the biggest highlight of OpenAI's release conference, as well as the most lamentable aspect.

The pragmatism of the pioneer may be the most obvious sign that the rapid development phase of technology is coming to an end.

Conference: Catastrophic Mistakes, Non-intuitive Presentation

Compared to the mediocre performance of GPT-5, this 1 hour and 20 minutes long conference can be described as a disaster.

First, there was "chart fraud" during the conference. Shortly after the conference began, sharp-eyed netizens discovered that the SWE Benchmark data was presented in a disproportionate manner to highlight the improvements of GPT-5.

The proportions here were completely incorrect, and soon netizens restored a true proportion.

Moreover, such errors are not limited to just one instance. In the presentation of the Tau 2 Benchmark, there was also a case where 55% was greater than 58.1%.

In response to these fatal errors, netizens quickly began a frenzy of mockery. For example, they used GPT's sequence number to create a table, mocking OpenAI's "table magic."

For OpenAI, which has already gone through the IMO gold medal controversy and is deeply entrenched in the image of a "hype master," such errors are like adding fuel to the fire, further solidifying their image as unreliable and full of hype.

In addition, during the demonstration, only the castle mini-game generated by GPT-5's Cursor at the end was somewhat impressive. All other presentations were lengthy, overly technical, and yielded mediocre results.

Compared to Anthropic's Claude operating the vending machine experiment and Gemini's Pokémon completion demonstration showcasing agent performance, which were more impactful and better displayed cutting-edge exploration, this event lacked highlights.

The presentation was further marred by boring cold jokes and lengthy reasoning wait times, making the event feel unprecedentedly dull.

If Ultraman is a marketing master, then this launch event was indeed not living up to its name.

It was precisely because Ultraman set high expectations before the event by stating "GPT-5 is even stronger than me," and the contrast with the bland performance during the presentation, along with the continuous errors, that OpenAI was clearly backfired by public opinion. According to a Polymarkt survey, after the event, people's evaluations of OpenAI's model capabilities even declined.

Behind the Launch Event, the AI Industry is Losing Speed

For the entire AI industry, this launch event may signify a future shrouded in shadows.

From the failure of the GPT-4.5 (Orion) project, we have already seen signs of a gradual slowdown in the parameter Scaling Law. Although Grok-4, which used ten times the computing power for reinforcement learning, performed impressively in certain tests, it did not demonstrate a revolutionary leap overall, suggesting that the Test-Time Compute (TTC) Scaling Law may also be reaching its peak As of today, GPT-5's "small step forward" regular progress seems to indicate that the low-hanging fruit has already been picked.

The invisible wall of rapid growth in AI has never been as apparent as it is today.

This may mean that we have to wake up from the frenzy of "exponential growth" and embrace a new phase that is more pragmatic and competitive. Perhaps the AI industry really needs a new breakthrough to return to the dreamlike pace of a significant leap in the AI generation.

However, when and in what form this breakthrough will come has become quite unpredictable.

The only certainty is that GPT-5 is still far from AGI.

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investing based on this is at one's own risk