MMLU-Pro holds steady at 85.0, AIME 2025 slightly improves to 89.3, while GPQA-Diamond dips from 80.7 to 79.9. Coding and agent benchmarks tell a similar story, with Codeforces ratings rising from ...
In today’s news, Deutsche Bank appoints receivers for Jon Adgemis’ pubs, a major investor slams Seven’s merger as “diworsification”, and Singtel’s boss blames Optus’ failings on “people issues”.
Discover Immuneering’s latest breakthrough: atebimetinib delivers unprecedented survival rates in first-line pancreatic cancer.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results