
HLE "The Last Exam of Humanity" has for the first time surpassed 60 points! Eigen-1, based on DeepSeek V3.1, significantly outperforms Grok4 and GPT-5
Recently, the Eigen-1 multi-agent system, jointly developed by teams from Yale University including Tang Xiangru and Wang Yujie, Shanghai Jiao Tong University’s Xu Wanghan, UCLA’s Wan Guancheng, Oxford University’s Yin Zhenfei, and Eigen AI’s Jin Di and Wang Hanrui, achieved a historic breakthrough—reaching a Pass@1 accuracy of 48.3% and a Pass@5 accuracy soaring to 61.74% on the HLE Bio/Chem Gold test set, crossing the 60-point threshold for the first time. This achievement far surpasses that of Google Gemini 2.5 Pro, OpenAI GPT-5, and Grok 4. Most excitingly, this accomplishment is not reliant on closed-source super large models, but is entirely built on the open-source DeepSeek V3.1
