
Meituan LongCat team released VitaBench
Meituan's LongCat team officially released the current benchmark for large model intelligent agent evaluation, VitaBench, which is highly close to real-life scenarios and addresses complex problems. According to the introduction, VitaBench uses three high-frequency real-life scenarios—food delivery ordering, dining in restaurants, and travel—as typical carriers, and has constructed an interactive evaluation environment that includes 66 tools, along with a comprehensive task design across different scenarios
