GPT-5.5 Beats Claude Fable 5 on New Agents’ Last Exam Benchm...

In a surprising turn of events, OpenAI’s GPT-5.5 has outperformed Anthropic’s Claude Fable 5 on the newly released Agents’ Last Exam (ALE) benchmark. The ALE benchmark, designed by a coalition of researchers, is considered one of the most rigorous tests of AI capabilities, evaluating models on complex reasoning, multi-step problem solving, and real-world task execution. GPT-5.5 achieved a significantly higher score, catching many in the AI community off guard, as Claude Fable 5 had been widely regarded as the leading model in several previous evaluations. The results highlight the rapid pace of improvement in AI models, with each new iteration pushing the boundaries of what is possible. Researchers noted that GPT-5.5 excelled particularly in tasks requiring long-term planning and adaptive decision-making, areas where Claude Fable 5 had previously shown strength. The benchmark results have sparked discussions about the evolving competitive landscape among AI developers, with OpenAI and Anthropic now locked in a tight race for supremacy. The ALE benchmark is expected to become a standard reference point for future model comparisons, and both companies are likely to accelerate their development cycles in response.

GPT-5.5 Beats Claude Fable 5 on New Agents’ Last Exam Benchmark

Related news