
"Language mixing" caused? OpenAI o3-mini exposed for extensively using Chinese reasoning

On the 1st, OpenAI launched the lightweight AI model o3-mini, but netizens discovered that it used Chinese reasoning extensively without user intervention, and even when asked in Russian, it would think in Chinese. This raised questions about whether OpenAI borrowed from the mainland DeepSeek model. Experts pointed out that AI models do not understand the differences in languages; they only process text and tokens, leading to the phenomenon of "language mixing." Similar issues have also been found in other AI models
OpenAI launched its latest lightweight artificial intelligence model o3-mini on the 1st, but foreign netizens discovered that it used Chinese extensively for reasoning without user intervention. Interestingly, even when asked in Russian, o3-mini-high would also think in Chinese. This has led foreign netizens to suspect that OpenAI is "borrowing" from China's DeepSeek model.
Chinese financial media outlet "Wall Street Insight" reported that netizens questioned OpenAI CEO Sam Altman and OpenAI about why o3-mini uses Chinese for reasoning. Netizen Annalisa Fernandez suggested that perhaps Chinese is the "soul language" of LLMs (large language models).
The report stated that this is not the first time such a phenomenon has occurred with OpenAI's models. As early as February 2024, developers had raised similar questions in the OpenAI developer community, although mixed with other languages; OpenAI's o1 also exhibited similar issues in reasoning models. In fact, this "language mixing" phenomenon has also been observed in other AI models, such as Google's Gemini, which mixes German.
Matthew Guzdial, an assistant professor at the University of Alberta and AI researcher, pointed out that "the model does not know what a language is or what the differences between languages are, because to it, these are just texts."
In reality, the way models perceive language is completely different from how most people understand it. Models do not directly process words but rather process tokens. For example, "fantastic" can be a complete token; it can be broken down into three tokens: "fan," "tas," and "tic"; or it can be completely disassembled, with each letter being a token.
However, this method of disassembly can lead to misunderstandings. Many tokenizers assume that a space indicates the start of a new word, but not all languages use spaces for tokenization, such as Chinese. DeepSeek analyzed this phenomenon in their paper. The research team found that when reinforcement learning prompts involve multiple languages, the reasoning chain often exhibits language mixing.
Currently, "language mixing" remains an urgent issue to resolve. After all, DeepSeek-R1 is only optimized for Chinese and English, and it may also encounter language mixing issues when handling queries in other languages.

