Skip to main content

GPT-5 and Top AI Models Fail New FormulaOne Benchmark

GPT-5 and Top AI Models Score Zero in New FormulaOne Benchmark

August 15, 2025 — A groundbreaking AI evaluation benchmark called FormulaOne has exposed significant limitations in today's most advanced artificial intelligence systems. Developed by AAI, a research institution specializing in superintelligence, the test revealed that models including GPT-5, Grok4, and o3Pro failed to solve its most challenging problems.

The FormulaOne Challenge

The benchmark consists of 220 novel graph-structured dynamic programming problems, spanning moderate to research-level difficulty. These problems incorporate complex domains such as:

  • Topology
  • Geometry
  • Combinatorics

Image

The problems are based on Courcelle's algorithmic meta-theorem, which states that any problem definable in logic for tree-like graphs can be solved using dynamic programming algorithms. This requires sophisticated tree decomposition techniques—organizing graph vertices into overlapping sets arranged hierarchically.

Performance Breakdown

While current AI models demonstrated moderate success on simpler problems (50%-70% accuracy), their performance plummeted with increased complexity:

Model Shallow-Level Success Deep-Level Success Doctoral-Level Success

Image

Academic Reactions

The results have sparked debate about whether AI can truly achieve doctoral-level reasoning. Some researchers propose including human PhD students in future evaluations for comparison.

"This benchmark highlights a critical gap in AI's ability to handle deeply abstract problems," noted an AAI spokesperson. "While models excel at pattern recognition, structured logical deduction remains a challenge."

The full leaderboard is available at: FormulaOne-Leaderboard

Key Points:

All top AI models scored zero on FormulaOne's most difficult problems.\ ✅ The benchmark tests 220 high-difficulty dynamic programming questions.\ ✅ Performance declines sharply with problem complexity, revealing AI's reasoning limitations.

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Zhipu's GLM-4.7-Flash Hits 1M Downloads in Record Time
News

Zhipu's GLM-4.7-Flash Hits 1M Downloads in Record Time

Zhipu AI's latest open-source model GLM-4.7-Flash has taken the developer community by storm, surpassing one million downloads on Hugging Face within just two weeks. This lightweight powerhouse outperforms competitors in key benchmarks while offering developers an efficient, cost-effective solution for AI applications.

February 4, 2026
AI ModelsOpen SourceMachine Learning
OpenAI's GPT-5.2 Gets a Speed Boost Without Price Hike
News

OpenAI's GPT-5.2 Gets a Speed Boost Without Price Hike

OpenAI has turbocharged its GPT-5.2 models, delivering responses 40% faster while keeping costs steady. The upgrade applies to both the general-purpose AI and its coding-focused sibling, GPT-5.2-Codex. Developers report noticeably quicker interactions in real-world testing, particularly benefiting coding workflows and API integrations. What's remarkable? These speed gains come without changing the underlying AI architecture or increasing prices.

February 4, 2026
OpenAIGPT-5AI Performance
News

Global AI Showdown: Overseas Models Lead While Chinese Contenders Close the Gap

The latest SuperCLUE benchmark reveals an intense competition in Chinese language AI models. While Anthropic's Claude-Opus-4.5-Reasoning tops the rankings, Chinese models like Kimi-K2.5-Thinking are making impressive strides, especially in specialized tasks like code generation and mathematical reasoning. The results highlight both the current dominance of overseas tech giants and the rapid progress of domestic alternatives.

February 4, 2026
AI BenchmarkChinese Language ModelsSuperCLUE
Chinese AI Models Narrow Gap With Global Leaders in Latest Rankings
News

Chinese AI Models Narrow Gap With Global Leaders in Latest Rankings

The latest SuperCLUE benchmark reveals fascinating shifts in China's AI landscape. While international giants still dominate overall rankings, Chinese models are making impressive strides—particularly in specialized areas like coding and math. What's more surprising? Domestic open-source models now outperform their foreign counterparts, signaling China's growing strength in collaborative AI development.

February 4, 2026
AI RankingsChinese TechMachine Learning
News

AI's Learning Gap: Why Machines Can't Grow from Failure Like Humans

A former OpenAI researcher reveals a critical flaw in today's AI systems: they can't learn from mistakes. Jerry Tworek, who helped develop key models at OpenAI, explains why this inability to adapt threatens progress toward true artificial general intelligence. Unlike humans who evolve through trial and error, current AI hits a wall when facing unfamiliar challenges - forcing experts to rethink fundamental architectures.

February 3, 2026
Artificial IntelligenceMachine LearningAGI
News

DeepMind Pioneer Bets on AI That Learns Like Humans

David Silver, the visionary behind DeepMind's AlphaGo, has left Google to pursue his bold new vision for artificial intelligence. His startup Ineffable Intelligence champions reinforcement learning - AI that learns through experience rather than just absorbing human knowledge. This departure signals a growing divide in AI research approaches as top talent explores alternatives to today's dominant large language models.

February 2, 2026
Artificial IntelligenceMachine LearningTech Startups