Skip to main content

Grok4 Outperforms GPT-5 in Reasoning, But at Higher Cost

AI Model Showdown: Performance vs. Cost in Latest Benchmarks

New testing data from the ARC Prize provides crucial insights into the evolving landscape of artificial intelligence, revealing stark differences in performance and operational costs between leading language models. The comprehensive evaluation compared xAI's Grok4 against OpenAI's GPT-5 across multiple benchmarks measuring general reasoning capabilities.

Benchmark Breakdown: Reasoning Capabilities Tested

In the demanding ARC-AGI-2 assessment, which evaluates complex reasoning:

  • Grok4 (Thinking) achieved 16% accuracy at $2-$4 per task
  • GPT-5 (Advanced) scored 9.9% at just $0.73 per task

Image Performance and cost comparison of leading language models on the ARC-AGI benchmark. | Image: ARC-AGI

The less intensive ARC-AGI-1 test showed:

  • Grok4 reached 68% accuracy ($1 per task)
  • GPT-5 achieved 65.7% ($0.51 per task)

"While Grok4 demonstrates superior reasoning capabilities, its cost structure makes GPT-5 more economically viable for many applications," noted an ARC Prize spokesperson.

Lightweight Contenders Emerge

The study also evaluated smaller model variants:

Model AGI-1 Score AGI-1 Cost AGI-2 Score AGI-2 Cost

Image Test results for Grok4, GPT-5, and smaller model variants on the ARC-AGI-1. | Image: ARC Prize

Surprise Performer and Future Tests

The discontinued o3-preview model from December 2024 surprisingly outperformed all current models on AGI-1 with nearly 80% accuracy, though at premium pricing. Meanwhile, development continues on ARC-AGI-3, which will test AI agents in interactive game environments - a challenge where most models still struggle compared to humans.

Key Points:

  1. Performance lead: Grok4 outperforms GPT-5 in reasoning tasks by significant margins (16% vs 9.9% on AGI-2)
  2. Cost efficiency: GPT-5 maintains better value proposition across all tests ($0.51 vs $1 on AGI-1)
  3. Lightweight options: Smaller GPT variants show promise for cost-sensitive applications
  4. Future benchmarks: New interactive testing environments may reshape performance rankings

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

AI's Reality Check: Top Models Flunk Expert Exam

In a humbling revelation, leading AI models including GPT-4o scored dismally on a rigorous new test designed by global experts. The 'Ultimate Human Exam' exposed critical limitations in AI reasoning, with top performers barely scraping 8% accuracy. These results challenge our assumptions about artificial intelligence's true capabilities and raise questions about whether current benchmarks measure real understanding or just sophisticated pattern matching.

February 3, 2026
AI testingMachine learningArtificial intelligence
Doubao AI Gets Smarter and Cheaper: Version 2.0 Cuts Costs Dramatically
News

Doubao AI Gets Smarter and Cheaper: Version 2.0 Cuts Costs Dramatically

Volcano Engine's Doubao Large Model just leveled up significantly. The new 2.0 version slashes inference costs by 90% while boosting performance across the board. With four specialized models catering to different needs, enhanced multimodal understanding that beats competitors like Gemini, and improved coding capabilities, Doubao is positioning itself as a serious AI contender. Developers will appreciate the newly opened API access and affordable pricing options.

February 14, 2026
AI developmentMachine learningTech innovation
News

Cerebras rockets to $23B valuation with OpenAI deal, taking aim at Nvidia

Chipmaker Cerebras Systems just landed a $1 billion funding round that tripled its valuation to $23 billion, thanks to its revolutionary wafer-scale chips that promise 20x faster AI processing. The company has secured a major partnership with OpenAI while clearing regulatory hurdles for a planned 2026 IPO - setting the stage for a direct challenge to Nvidia's dominance in AI computing.

February 10, 2026
AI chipsSemiconductorsArtificial intelligence
Meituan's New AI Model Packs Big Performance in Small Package
News

Meituan's New AI Model Packs Big Performance in Small Package

Meituan's LongCat team has unveiled their latest AI innovation - the LongCat-Flash-Lite model. Breaking from traditional approaches, this model uses 'Embedding Expansion' to achieve impressive results with just 2.9-4.5 billion active parameters per inference. Surprisingly efficient yet powerful, it delivers speeds of 500-700 tokens per second while maintaining strong performance across coding, general knowledge, and specialized tasks.

February 6, 2026
AI innovationMachine learningNatural language processing
Zhipu's GLM-4.7-Flash Hits 1 Million Downloads in Just Two Weeks
News

Zhipu's GLM-4.7-Flash Hits 1 Million Downloads in Just Two Weeks

Zhipu AI's lightweight model GLM-4.7-Flash has taken the open-source community by storm, surpassing 1 million downloads on Hugging Face within 14 days of release. This hybrid thinking model outperforms competitors in benchmark tests, offering developers an efficient and cost-effective solution for AI applications. Its rapid adoption signals strong market validation for Zhipu's approach to balancing performance with practical deployment considerations.

February 4, 2026
AI developmentOpen sourceMachine learning
News

AI Stocks Soar: MINIMAX-WP Hits All-Time High Amid Sector Rally

Hong Kong's AI sector saw explosive growth on February 3rd, with MINIMAX-WP shares surging over 14% to a record high. Close competitor Zhipu wasn't far behind, climbing 11% as investors bet big on China's AI future. The rally comes amid growing government support and intense competition in consumer AI applications like digital red envelopes.

February 3, 2026
AI stocksHong Kong marketTech investment