Skip to main content

AI Struggles with PhD-Level Physics Tests

AI Hits Physics Wall: Top Models Score Below 10% on Doctoral-Level Tests

Image

Imagine handing your toughest physics homework to the smartest AI available today. The results might surprise you - and not in a good way. A new benchmark called CritPt reveals even our most advanced artificial intelligence struggles with basic research skills expected of physics PhD students.

The Ultimate Physics Exam for AI

More than 50 physicists from leading institutions worldwide created CritPt specifically to test whether AI can handle original, unpublished research problems. Forget textbook questions - these are the real challenges scientists face daily across quantum physics, astrophysics, and other cutting-edge fields.

The test includes:

  • 71 complete research challenges
  • Divided into 190 smaller checkpoints
  • All based on unpublished materials to prevent cheating

"We wanted to see if AI could think like a researcher," explains one physicist involved in the project. "Not just recall information, but solve problems nobody's tackled before."

Shockingly Low Scores

The numbers tell a sobering story:

  • Gemini3Pro Preview: 9.1% accuracy (Google's best effort)
  • GPT-5.1 (high): Just 4.9% correct (OpenAI's top model)

The tests revealed fundamental weaknesses:

  1. Models perform slightly better on well-defined sub-tasks
  2. Complete research problems? Nearly complete failure
  3. "Consistent resolution" scores (correct answers repeated) were even worse

The most concerning finding? These advanced systems often produce answers that look reasonable at first glance but contain subtle errors that could derail real research.

Why Can't AI Crack Physics?

The core issue appears to be reasoning ability. Current models:

  • Lack true understanding of physical principles
  • Struggle with multi-step problem solving
  • Can't maintain logical consistency across complex calculations "It's like having a brilliant student who keeps making careless mistakes," one researcher noted. "You wouldn't trust them with your lab work."

The implications are serious:

  • Human experts must double-check all AI output
  • Potential time savings evaporate during error correction
  • Autonomous scientific discovery remains distant Companies aren't giving up though - OpenAI still plans to launch an "AI research intern" system by September 2026.

Key Points:

1️⃣ Current Limitations: Top AI models score under 10% on doctoral-level physics tests 2️⃣ Hidden Dangers: Seemingly correct answers often contain subtle errors 3️⃣ Practical Role: Better suited as assistants than independent researchers 4️⃣ Future Outlook: Significant breakthroughs needed before Nobel-worthy work

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Google's Gemma 4: A Powerhouse AI Model Set to Shake Up Open-Source Landscape

Google is gearing up to unveil Gemma 4, its next-generation open-source AI model that promises four times the parameters of its predecessor. With a rumored 120 billion parameters and innovative MoE architecture, this release marks Google's strategic move to reclaim influence in the open-source AI space. The tech world watches closely as this development could redefine the balance between commercial and open-source AI models.

April 2, 2026
AI DevelopmentOpen Source TechMachine Learning
ClawHub's China Mirror Site Goes Live - AI Developers Rejoice!
News

ClawHub's China Mirror Site Goes Live - AI Developers Rejoice!

ClawHub, the popular 'npm for AI Agents,' has launched its official Chinese mirror site, bringing faster access and better stability for domestic developers. The new mirror at https://mirror-cn.clawhub.com solves previous network latency issues, making it easier than ever to share and discover AI skills. Sponsored by ByteDance's VolcanoEngine, this move signals growing localization in the AI Agent ecosystem.

April 1, 2026
AI DevelopmentOpen SourceMachine Learning
China's AI Models Make Global Waves: Doubao Nears GPT-5, Xiaomi Shines in Math
News

China's AI Models Make Global Waves: Doubao Nears GPT-5, Xiaomi Shines in Math

The latest SuperCLUE rankings reveal China's AI models are closing the gap with global leaders. ByteDance's Doubao now trails GPT-5 by less than one point, while Xiaomi's MiMo surprises with standout math performance. In open-source categories, Chinese models dominate completely, signaling a shift from language specialists to all-around competitors.

March 30, 2026
AIChinese TechMachine Learning
News

Moonshot AI's Stunning Pivot: From Tech Demo to Revenue Powerhouse

In a dramatic shift, Moonshot AI has transformed from a promising tech startup to a commercial juggernaut. The company's recent K2.5 model release generated more revenue in 20 days than all of last year, prompting a rush toward IPO preparations. With valuations soaring to $18 billion and overseas revenue surpassing domestic for the first time, China's AI landscape is witnessing a fundamental transformation from speculative investment to proven business models.

March 30, 2026
Artificial IntelligenceTech IPOMoonshot AI
News

116 AI Innovations Honored with China's Prestigious Wu Wenjun Award

China's AI community celebrated its brightest minds as the 15th Wu Wenjun Artificial Intelligence Science and Technology Award recognized 116 groundbreaking projects. The awards highlight advancements in generative AI, large models, and embodied intelligence, with top honors going to Tsinghua's Professor Sun Fuchun and Chongqing University's Academician Song Yongduan. Industry applications in autonomous driving and healthcare signal China's growing AI ecosystem.

March 30, 2026
Artificial IntelligenceWu Wenjun AwardAI Research
News

Robots Get a Crash Course in Common Sense with New AI Model

DeepMind Intelligence has unveiled PhysBrain 1.0, a breakthrough AI model that teaches robots to understand physical laws like humans do. Unlike traditional approaches that simply mimic actions, this system grasps the underlying principles of how objects interact in space and time. Developed by Beijing's Zhongguancun tech hub, the technology could help robots adapt to unpredictable real-world environments with remarkable efficiency.

March 27, 2026
Artificial IntelligenceRoboticsMachine Learning