<p>Recently, the Eigen-1 multi-agent system, jointly developed by teams from Yale University including Tang Xiangru and Wang Yujie, Shanghai Jiao Tong University’s Xu Wanghan, UCLA’s Wan Guancheng, Oxford University’s Yin Zhenfei, and Eigen AI’s Jin Di and Wang Hanrui, achieved a historic breakthrough—reaching a Pass@1 accuracy of 48.3% and a Pass@5 accuracy soaring to 61.74% on the HLE Bio/Chem Gold test set, crossing the 60-point threshold for the first time. This achievement far surpasses that of Google Gemini 2.5 Pro, OpenAI GPT-5, and Grok 4. Most excitingly, this accomplishment is not reliant on closed-source super large models, but is entirely built on the open-source DeepSeek V3.1</p>

OpenAI

GOOG

<p>Eigen-1 multi-agent system achieved a historic breakthrough on the HLE Bio/Chem Gold test set, with a Pass@1 accuracy of 48.3% and a Pass@5 accuracy of 61.74%, surpassing 60 points for the first time, leading Google Gemini 2.5 Pro, OpenAI GPT-5, and Grok 4. This achievement is based on the open-source DeepSeek V3.1, rather than a closed-source large model</p>

HLE“人类最后考试” 首次突破 60 分！Eigen-1 基于 DeepSeek V3.1 显著领先 Grok4、GPT-5