Skip to main content

GPT-5 and Top AI Models Fail New FormulaOne Benchmark

GPT-5 and Top AI Models Score Zero in New FormulaOne Benchmark

August 15, 2025 — A groundbreaking AI evaluation benchmark called FormulaOne has exposed significant limitations in today's most advanced artificial intelligence systems. Developed by AAI, a research institution specializing in superintelligence, the test revealed that models including GPT-5, Grok4, and o3Pro failed to solve its most challenging problems.

The FormulaOne Challenge

The benchmark consists of 220 novel graph-structured dynamic programming problems, spanning moderate to research-level difficulty. These problems incorporate complex domains such as:

  • Topology
  • Geometry
  • Combinatorics

Image

The problems are based on Courcelle's algorithmic meta-theorem, which states that any problem definable in logic for tree-like graphs can be solved using dynamic programming algorithms. This requires sophisticated tree decomposition techniques—organizing graph vertices into overlapping sets arranged hierarchically.

Performance Breakdown

While current AI models demonstrated moderate success on simpler problems (50%-70% accuracy), their performance plummeted with increased complexity:

Model Shallow-Level Success Deep-Level Success Doctoral-Level Success

Image

Academic Reactions

The results have sparked debate about whether AI can truly achieve doctoral-level reasoning. Some researchers propose including human PhD students in future evaluations for comparison.

"This benchmark highlights a critical gap in AI's ability to handle deeply abstract problems," noted an AAI spokesperson. "While models excel at pattern recognition, structured logical deduction remains a challenge."

The full leaderboard is available at: FormulaOne-Leaderboard

Key Points:

All top AI models scored zero on FormulaOne's most difficult problems.\ ✅ The benchmark tests 220 high-difficulty dynamic programming questions.\ ✅ Performance declines sharply with problem complexity, revealing AI's reasoning limitations.

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Xiaohongshu Shakes Up AI World by Open-Sourcing Its Relax Training Engine

In a surprising move, lifestyle platform Xiaohongshu has open-sourced its AI training engine called Relax, designed for multi-modal scenarios. This sophisticated tool handles text, images, audio and video through innovative parallel processing. The unexpected contribution from a non-traditional AI player signals the company's serious ambitions in artificial intelligence development and its desire to build influence in the tech community.

April 15, 2026
AIOpen SourceMachine Learning
HarmonyGNN: A Breakthrough in AI's Understanding of Complex Relationships
News

HarmonyGNN: A Breakthrough in AI's Understanding of Complex Relationships

A new AI training method called HarmonyGNN is revolutionizing how computers understand complex relationships in data. Developed by researchers at North Carolina State University, this technique helps neural networks better distinguish between different types of connections in graph data, achieving accuracy improvements up to 9.6%. The innovation could have significant implications for fields like drug discovery and weather forecasting.

April 14, 2026
Artificial IntelligenceMachine LearningGraph Neural Networks
Xiaomi's AI Model Joins Leading Open-Source Framework with Free Trial
News

Xiaomi's AI Model Joins Leading Open-Source Framework with Free Trial

Xiaomi has integrated its MiMo-V2 AI model series into the Hermes Agent framework, a major player in open-source AI development. Developers can now access Xiaomi's Pro, Omni, and Flash models for free for two weeks. This partnership combines Xiaomi's hardware expertise with Hermes' self-evolving capabilities, offering new possibilities for AI assistants. The move signals a shift in AI competition from conversational quality to execution efficiency.

April 10, 2026
XiaomiAI DevelopmentOpen Source
DeepSeek V4 Arrives Next Month: A Trillion-Parameter Powerhouse Built for China's AI Future
News

DeepSeek V4 Arrives Next Month: A Trillion-Parameter Powerhouse Built for China's AI Future

China's AI landscape is about to get a major upgrade. DeepSeek founder Liang Wenfeng has confirmed their next-generation V4 model will launch in late April 2026, packing trillion-parameter scale and breakthrough compatibility with domestic chips like Huawei's Ascend. This isn't just another model release - it's a strategic move that's already shaking up China's computing market, with tech giants stockpiling AI chips in anticipation. The model's 'Fast' and 'Expert' modes currently in testing hint at its versatile capabilities, from quick searches to complex problem-solving.

April 10, 2026
AI InnovationChina TechDeepSeek
News

DeepSeek V4 Emerges: A Glimpse Into China's Next-Gen AI Powerhouse

The tech world is abuzz as DeepSeek V4 enters intensive testing, revealing three distinct versions tailored for different needs. From lightning-fast responses to advanced visual analysis, this homegrown AI showcases China's push for technological independence. What makes this release particularly exciting is its deep integration with domestic chips, signaling a strategic move away from foreign dependencies. As the AI arms race heats up, could this be the model that redefines what Chinese-developed artificial intelligence can achieve?

April 8, 2026
AI DevelopmentChinese TechMachine Learning
News

Alibaba's New AI Algorithm Pushes Reasoning Limits Beyond OpenAI's Mini Model

Alibaba's Tongyi Lab has unveiled FIPO, a groundbreaking algorithm that dramatically enhances AI reasoning capabilities. This innovation allows models to process over 10,000 tokens in complex problems, outperforming even OpenAI's o1-mini in certain benchmarks. The technology introduces clever mechanisms like Future-KL to help AI 'think ahead,' marking a significant leap in machine intelligence.

April 8, 2026
AI ResearchMachine LearningAlibaba