AI Struggles with PhD-Level Physics Tests

AI Hits Physics Wall: Top Models Score Below 10% on Doctoral-Level Tests

Imagine handing your toughest physics homework to the smartest AI available today. The results might surprise you - and not in a good way. A new benchmark called CritPt reveals even our most advanced artificial intelligence struggles with basic research skills expected of physics PhD students.

The Ultimate Physics Exam for AI

More than 50 physicists from leading institutions worldwide created CritPt specifically to test whether AI can handle original, unpublished research problems. Forget textbook questions - these are the real challenges scientists face daily across quantum physics, astrophysics, and other cutting-edge fields.

The test includes:

71 complete research challenges
Divided into 190 smaller checkpoints
All based on unpublished materials to prevent cheating

"We wanted to see if AI could think like a researcher," explains one physicist involved in the project. "Not just recall information, but solve problems nobody's tackled before."

Shockingly Low Scores

The numbers tell a sobering story:

Gemini3Pro Preview: 9.1% accuracy (Google's best effort)
GPT-5.1 (high): Just 4.9% correct (OpenAI's top model)

The tests revealed fundamental weaknesses:

Models perform slightly better on well-defined sub-tasks
Complete research problems? Nearly complete failure
"Consistent resolution" scores (correct answers repeated) were even worse

The most concerning finding? These advanced systems often produce answers that look reasonable at first glance but contain subtle errors that could derail real research.

Why Can't AI Crack Physics?

The core issue appears to be reasoning ability. Current models:

Lack true understanding of physical principles
Struggle with multi-step problem solving
Can't maintain logical consistency across complex calculations "It's like having a brilliant student who keeps making careless mistakes," one researcher noted. "You wouldn't trust them with your lab work."

The implications are serious:

Human experts must double-check all AI output
Potential time savings evaporate during error correction
Autonomous scientific discovery remains distant Companies aren't giving up though - OpenAI still plans to launch an "AI research intern" system by September 2026.

Key Points:

1️⃣ Current Limitations: Top AI models score under 10% on doctoral-level physics tests 2️⃣ Hidden Dangers: Seemingly correct answers often contain subtle errors 3️⃣ Practical Role: Better suited as assistants than independent researchers 4️⃣ Future Outlook: Significant breakthroughs needed before Nobel-worthy work

AI Struggles with PhD-Level Physics Tests

AI Hits Physics Wall: Top Models Score Below 10% on Doctoral-Level Tests

The Ultimate Physics Exam for AI

Shockingly Low Scores

Why Can't AI Crack Physics?

Key Points:

Enjoyed this article?

Related Articles

Google's Gemma 4: A Powerhouse AI Model Set to Shake Up Open-Source Landscape

ClawHub's China Mirror Site Goes Live - AI Developers Rejoice!

China's AI Models Make Global Waves: Doubao Nears GPT-5, Xiaomi Shines in Math

Moonshot AI's Stunning Pivot: From Tech Demo to Revenue Powerhouse

116 AI Innovations Honored with China's Prestigious Wu Wenjun Award

Robots Get a Crash Course in Common Sense with New AI Model

Popular Articles

TSMC Reports Record Revenue, AI Growth Fuels Optimism for 2025

Plaud AI Pro Launches with 30-Hour Battery and Smart Screen

Nano Banana 2: Your AI-Powered Creative Sidekick

Wittro: Undetectable AI Assistant for Interviews & Meetings

Nano Banana: AI Image Editor

Main Pages

Content

Others