Grok4 Outperforms GPT-5 in Reasoning, But at Higher CostWelcome to AI DAMN! Discover the most amazing latest AI news, innovative AI products, and groundbreaking AI projects. From ChatGPT to cutting-edge models, we curate the AI developments that make you go 'DAMN!' - your daily dose of mind-blowing artificial intelligence.

Discover

Language

Account

Grok4 Outperforms GPT-5 in Reasoning, But at Higher Cost

AI Model Showdown: Performance vs. Cost in Latest Benchmarks

New testing data from the ARC Prize provides crucial insights into the evolving landscape of artificial intelligence, revealing stark differences in performance and operational costs between leading language models. The comprehensive evaluation compared xAI's Grok4 against OpenAI's GPT-5 across multiple benchmarks measuring general reasoning capabilities.

Benchmark Breakdown: Reasoning Capabilities Tested

In the demanding ARC-AGI-2 assessment, which evaluates complex reasoning:

Grok4 (Thinking) achieved 16% accuracy at $2-$4 per task
GPT-5 (Advanced) scored 9.9% at just $0.73 per task

Performance and cost comparison of leading language models on the ARC-AGI benchmark. | Image: ARC-AGI

The less intensive ARC-AGI-1 test showed:

Grok4 reached 68% accuracy ($1 per task)
GPT-5 achieved 65.7% ($0.51 per task)

"While Grok4 demonstrates superior reasoning capabilities, its cost structure makes GPT-5 more economically viable for many applications," noted an ARC Prize spokesperson.

Lightweight Contenders Emerge

The study also evaluated smaller model variants:

Model	AGI-1 Score	AGI-1 Cost	AGI-2 Score	AGI-2 Cost

Test results for Grok4, GPT-5, and smaller model variants on the ARC-AGI-1. | Image: ARC Prize

Surprise Performer and Future Tests

The discontinued o3-preview model from December 2024 surprisingly outperformed all current models on AGI-1 with nearly 80% accuracy, though at premium pricing. Meanwhile, development continues on ARC-AGI-3, which will test AI agents in interactive game environments - a challenge where most models still struggle compared to humans.

Key Points:

Performance lead: Grok4 outperforms GPT-5 in reasoning tasks by significant margins (16% vs 9.9% on AGI-2)
Cost efficiency: GPT-5 maintains better value proposition across all tests ($0.51 vs $1 on AGI-1)
Lightweight options: Smaller GPT variants show promise for cost-sensitive applications
Future benchmarks: New interactive testing environments may reshape performance rankings

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

News

AI's Reality Check: Top Models Flunk Expert Exam

In a humbling revelation, leading AI models including GPT-4o scored dismally on a rigorous new test designed by global experts. The 'Ultimate Human Exam' exposed critical limitations in AI reasoning, with top performers barely scraping 8% accuracy. These results challenge our assumptions about artificial intelligence's true capabilities and raise questions about whether current benchmarks measure real understanding or just sophisticated pattern matching.

February 3, 2026

AI testingMachine learningArtificial intelligence

News

Doubao AI Gets Smarter and Cheaper: Version 2.0 Cuts Costs Dramatically

Volcano Engine's Doubao Large Model just leveled up significantly. The new 2.0 version slashes inference costs by 90% while boosting performance across the board. With four specialized models catering to different needs, enhanced multimodal understanding that beats competitors like Gemini, and improved coding capabilities, Doubao is positioning itself as a serious AI contender. Developers will appreciate the newly opened API access and affordable pricing options.

February 14, 2026

AI developmentMachine learningTech innovation

News

Cerebras rockets to $23B valuation with OpenAI deal, taking aim at Nvidia

Chipmaker Cerebras Systems just landed a $1 billion funding round that tripled its valuation to $23 billion, thanks to its revolutionary wafer-scale chips that promise 20x faster AI processing. The company has secured a major partnership with OpenAI while clearing regulatory hurdles for a planned 2026 IPO - setting the stage for a direct challenge to Nvidia's dominance in AI computing.

February 10, 2026

AI chipsSemiconductorsArtificial intelligence

News

Meituan's New AI Model Packs Big Performance in Small Package

Meituan's LongCat team has unveiled their latest AI innovation - the LongCat-Flash-Lite model. Breaking from traditional approaches, this model uses 'Embedding Expansion' to achieve impressive results with just 2.9-4.5 billion active parameters per inference. Surprisingly efficient yet powerful, it delivers speeds of 500-700 tokens per second while maintaining strong performance across coding, general knowledge, and specialized tasks.

February 6, 2026

AI innovationMachine learningNatural language processing

News

Zhipu's GLM-4.7-Flash Hits 1 Million Downloads in Just Two Weeks

Zhipu AI's lightweight model GLM-4.7-Flash has taken the open-source community by storm, surpassing 1 million downloads on Hugging Face within 14 days of release. This hybrid thinking model outperforms competitors in benchmark tests, offering developers an efficient and cost-effective solution for AI applications. Its rapid adoption signals strong market validation for Zhipu's approach to balancing performance with practical deployment considerations.

February 4, 2026

AI developmentOpen sourceMachine learning

News

AI Stocks Soar: MINIMAX-WP Hits All-Time High Amid Sector Rally

Hong Kong's AI sector saw explosive growth on February 3rd, with MINIMAX-WP shares surging over 14% to a record high. Close competitor Zhipu wasn't far behind, climbing 11% as investors bet big on China's AI future. The rally comes amid growing government support and intense competition in consumer AI applications like digital red envelopes.

February 3, 2026

AI stocksHong Kong marketTech investment

Grok4 Outperforms GPT-5 in Reasoning, But at Higher Cost

AI Model Showdown: Performance vs. Cost in Latest Benchmarks

Benchmark Breakdown: Reasoning Capabilities Tested

Lightweight Contenders Emerge

Surprise Performer and Future Tests

Key Points:

Enjoyed this article?

Related Articles

AI's Reality Check: Top Models Flunk Expert Exam

Doubao AI Gets Smarter and Cheaper: Version 2.0 Cuts Costs Dramatically

Cerebras rockets to $23B valuation with OpenAI deal, taking aim at Nvidia

Meituan's New AI Model Packs Big Performance in Small Package

Zhipu's GLM-4.7-Flash Hits 1 Million Downloads in Just Two Weeks

AI Stocks Soar: MINIMAX-WP Hits All-Time High Amid Sector Rally

Popular Articles

TSMC Reports Record Revenue, AI Growth Fuels Optimism for 2025

Anthropic Bolsters AI Safety with Humanloop Team Acquisition

ChatGPT Launches Instant Checkout for Seamless E-commerce

OpenAI Unveils Sora 2 Video Model and Social App

Plaud AI Pro Launches with 30-Hour Battery and Smart Screen

Main Pages

Content

Others