Skip to main content

AI Struggles with Physics Puzzles: Top Models Score Below 10%

When AI Meets Advanced Physics: The Reality Check

Image Image source note: The image is AI-generated, and the image licensing service provider is Midjourney

Imagine handing a stack of cutting-edge physics problems to a bright PhD candidate - then discovering they can't solve even one in ten correctly. That's essentially what happened when researchers tested today's most advanced AI systems against real scientific challenges.

The Tough Test Behind the Numbers

The "CritPt" benchmark wasn't playing nice. Developed by over 50 physicists worldwide, it presented 71 unpublished research problems across quantum physics, astrophysics, and other demanding fields. These weren't textbook exercises but fresh challenges designed to mimic what early-career researchers actually face.

"We wanted to eliminate any advantage from memorization or pattern recognition," explains the team behind CritPt. "Every question tests genuine understanding and problem-solving."

Surprising Shortfalls

When the scores came in, even optimists raised eyebrows:

  • Google's Gemini3Pro: 9.1% accuracy
  • OpenAI's GPT-5: Just 4.9%

The numbers got worse under stricter evaluation. When models had to get answers right four out of five tries (the "continuous resolution rate"), performance plummeted further.

"These systems can produce answers that look convincing at first glance," notes one physicist involved in testing. "But peer closer, and you'll find subtle errors that could derail real research if unchecked."

Why This Matters Beyond Labs

The implications stretch far beyond theoretical physics:

  1. Research workflows: Scientists using AI tools must budget extra time for verification
  2. Public perception: Temper expectations about AI replacing human experts anytime soon
  3. Development priorities: Highlights where future AI training should focus

A More Realistic Role Emerges

Rather than replacing researchers, leading labs now see AI as sophisticated assistants:

  • OpenAI plans a "research intern" system by 2026
  • Fully autonomous research isn't expected before 2028
  • Current models already help save time on routine tasks

"Think of them like brilliant but error-prone grad students," suggests one team member. "Their ideas can spark breakthroughs, but you'd never let them run unsupervised."

Key Points:

  • 🔬 Top AI models scored under 10% on unpublished physics challenges
  • 🤯 Performance dropped further when requiring consistent accuracy
  • 🛠️ Future role likely as assistive tools rather than independent researchers
  • ⏳ Full autonomy in complex sciences remains years away

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Indian Startup Emversity Secures $30M to Train Workers AI Can't Replace

As AI reshapes job markets worldwide, Indian vocational training startup Emversity has doubled its valuation to $120 million by focusing on an unexpected niche: jobs that resist automation. The company's $30 million Series A funding will expand its programs training nurses, therapists and hospitality workers - roles requiring human touch that AI struggles to replicate. Partnering with universities and employers, Emversity bridges India's skills gap while creating career paths insulated from technological disruption.

January 15, 2026
vocational trainingfuture of workskills gap
Google Trends Gets Smarter: AI-Powered Comparisons Now Available
News

Google Trends Gets Smarter: AI-Powered Comparisons Now Available

Google Trends just leveled up with Gemini AI integration, transforming how we explore search trends. The update introduces smart sidebars that automatically suggest related searches and visual improvements making data easier to digest. Now comparing up to eight topics at once, journalists and researchers can uncover hidden connections faster than ever.

January 15, 2026
GoogleData AnalysisAI Tools
News

AliQianwen App Debuts Tomorrow: Your AI-Powered Lifestyle Concierge

Alibaba's new AliQianwen app launches tomorrow, transforming from a simple Q&A tool into a comprehensive AI lifestyle assistant. Integrating Gaode Maps, Eleme food delivery, Taobao shopping and Alibaba Health services, it promises to simplify daily decisions - from finding the perfect family outing to securing last-minute dinner reservations. The app leverages Alibaba Cloud's Tongyi model to analyze real-time data like traffic, weather and preferences, delivering personalized action plans with one-click execution.

January 15, 2026
AI assistantsAlibaba ecosystemsmart living
News

Samsung Makes Core Galaxy AI Features Free Forever

In a move that will delight smartphone users, Samsung has quietly updated its policy to make 13 core Galaxy AI features permanently free. The company removed ambiguous language about potential future charges, confirming these tools—including call transcription, photo editing aids, and real-time translation—will remain complimentary indefinitely. While reserving the right to charge for premium upgrades later, Samsung's decision sets it apart in an industry increasingly pushing subscription models.

January 15, 2026
GalaxyAISamsungMobileTech
JD.com Offers 100K Prize for Best AI-Generated Short Films
News

JD.com Offers 100K Prize for Best AI-Generated Short Films

JD.com has kicked off its inaugural AI Film Creation Competition, inviting creators nationwide to submit original short videos featuring their digital mascot Ma Honghong or product images. With prizes up to ¥100,000, the contest encourages innovative use of AI tools for video creation while requiring proper licensing for non-AI elements. Submissions will be judged equally on data metrics and expert evaluation across four creative dimensions.

January 15, 2026
AI filmmakingJD.comcreative competitions
News

Rili Tech's UEX System Brings AI-Powered Clarity to Industrial X-ray Imaging

Chinese firm Rili Technology has unveiled UEX, a groundbreaking AI system that transforms industrial X-ray imaging. Capable of enhancing 1536×1536 pixel images in just 15 milliseconds, this technology promises to revolutionize quality control in semiconductors, batteries, and automotive manufacturing. The system combines noise reduction, sharpening, and contrast optimization while reducing radiation exposure—a game-changer for production lines demanding both speed and precision.

January 15, 2026
industrial AIX-ray technologyquality control