
Large models begin to "batch solve" mathematical problems

OpenAI's latest released GPT 5.2 model has achieved significant improvements in mathematical reasoning capabilities. Software engineer and former quantitative researcher Neel Somani found in tests that the model can provide complete mathematical proofs within 15 minutes and verify them accurately using the formal tool Harmonic. This performance far exceeds previous versions, marking a new stage where AI tools transition from a supportive role to independently tackling high-difficulty mathematical problems
Breakthroughs in artificial intelligence in the field of mathematics are accelerating. Since Christmas, 15 of the more than 1,000 unsolved problems left by the famous mathematician Paul Erdős have transitioned from "unsolved" to "solved," with 11 of these explicitly marked as having involved AI models in the solving process. This progress marks an unprecedented capability of large language models in advancing the frontiers of human knowledge.
According to a report by TechCrunch on Thursday, OpenAI's latest GPT 5.2 model has achieved significant improvements in mathematical reasoning capabilities. Software engineer and former quantitative researcher Neel Somani discovered in tests that the model could provide a complete mathematical proof within 15 minutes and verify its correctness using the formal tool Harmonic. This performance far exceeds previous versions, shifting AI tools from a supportive role to a new stage where they can independently tackle high-difficulty mathematical problems.
Fields Medal winner Terence Tao has documented on his GitHub page that AI models have made substantial autonomous progress on 8 different Erdős problems, with another 6 breakthroughs achieved by locating and expanding upon previous research. Although there is still a distance to fully autonomous mathematical research, the role of large models in the field of mathematics can no longer be ignored.
This progress has a direct impact on both the mathematical research ecosystem and the AI application market. Formal tools such as the open-source proof assistant Lean developed by Microsoft Research, as well as AI tools like Aristotle from Harmonic, are being widely adopted by top mathematicians and computer science professors, signaling a profound transformation in academic research workflows.
From Accidental Discovery to Systematic Breakthrough
Somani's discovery began with a routine test. He input a mathematical problem into ChatGPT, allowing the model to think for 15 minutes before returning a complete answer. The proof referenced mathematical axioms such as the Legendre formula, Bertrand's conjecture, and the Star of David theorem, ultimately finding an elegant solution to a similar problem published by Harvard mathematician Noam Elkies on the Math Overflow forum in 2013. However, ChatGPT's final proof differed in key aspects from Elkies' work and provided a more complete answer to a certain version of the Erdős problem.
"I wanted to establish a benchmark to understand when large language models can effectively solve open mathematical problems and where they still face difficulties," Somani stated. Surprisingly, after using the latest model, this frontier began to advance.
The Erdős problem set contains over 1,000 conjectures proposed and maintained online by the Hungarian mathematician. These problems vary significantly in theme and difficulty, making them an enticing target for AI-driven mathematical research. The first autonomous solutions were produced last November by the Gemini-powered AlphaEvolve model, but recently GPT 5.2 has performed even better in advanced mathematics. Somani described it as "more proficient in mathematical reasoning than previous versions."
Cautious Assessment by Top Mathematicians
Terence Tao holds a more nuanced view on this development. He speculated on Mastodon that the scalability of AI systems makes them "better suited for systematically applying to the 'long tail' of lesser-known Erdős problems, many of which actually have direct solutions."
"Thus, many simpler Erdős problems are now more likely to be solved through pure AI methods rather than human or hybrid approaches," Tao added.
This assessment reveals the positioning of AI in mathematical research: not to replace human mathematicians in tackling the most cutting-edge complex problems, but to efficiently handle a large number of medium-difficulty problems that have long been overlooked due to limited human resources. This division of labor could reshape the allocation of resources in mathematical research.
Formal Tools Drive Application Implementation
Another key driving factor is the recent shift towards formalization in the mathematical community. Formalization is a labor-intensive task that makes mathematical reasoning easier to verify and extend. While formalization does not have to rely on AI or computers, the new generation of automated tools has significantly reduced the difficulty of the work.
The open-source "proof assistant" Lean developed by Microsoft Research in 2013 has been widely used in this field, while AI tools like Aristotle from Harmonic promise to automate much of the formalization work.
Harmonic founder Tudor Achim believes that the sudden increase in the number of Erdős problem solutions is less important than top mathematicians starting to take these tools seriously. "I am more concerned about mathematics and computer science professors using these AI tools," Achim stated, "These individuals need to protect their reputations, so when they say they are using Aristotle or ChatGPT, that is real evidence."
This trend indicates that AI tools have moved from the experimental stage into mainstream academic applications, potentially opening new business opportunities for related technology companies while also challenging traditional methodologies in mathematical research
