Skip to main content

ChatGPT's Scientific Judgment Falls Short, Study Finds

ChatGPT's Scientific Inconsistencies Exposed in New Study

When it comes to complex scientific judgments, ChatGPT might not be as reliable as its confident tone suggests. A recent Washington State University study paints a concerning picture of the AI's limitations in this critical area.

The Flaws Beneath the Surface

Professor Mesut Cicek's team put ChatGPT through rigorous testing, analyzing its responses to 719 research hypotheses from business journals. The results? While initial accuracy appeared decent at around 80%, deeper analysis revealed serious problems:

  • Accuracy barely better than guessing: After accounting for random chance, performance improved only slightly above 50/50 odds - what researchers called a "low D-grade" showing.
  • Particularly poor at spotting falsehoods: The model correctly identified false statements just 16.4% of the time.
  • Version upgrades didn't help: Even newer iterations like ChatGPT-5 mini showed no significant improvement on these tasks.

The Consistency Problem

The study uncovered another troubling pattern - ChatGPT often couldn't stick to its own answers. Researchers submitted each hypothesis multiple times and found:

"In some cases, we'd get completely contradictory responses using identical prompts," noted Professor Cicek. "One query might alternate between 'true' and 'false' answers like flipping a coin."

While the model maintained consistent conclusions about 73% of the time, that still leaves significant room for error in professional settings where reliability matters most.

Why This Matters for Businesses

The research team issued clear warnings for corporate decision-makers:

  1. Don't mistake fluency for expertise: ChatGPT's polished language can mask its lack of true understanding.
  2. Always verify outputs: Never treat AI conclusions as final without human review.
  3. Train staff appropriately: Employees need education about both AI capabilities and limitations.

"These tools don't actually 'know' anything in the human sense," Cicek explained. "They're matching patterns from training data, not reasoning through problems."

Key Points:

  • ChatGPT struggles with scientific truth verification, performing only slightly better than random guessing
  • Consistency issues plague responses, with answers sometimes flip-flopping completely
  • Newer versions show little improvement on these specific tasks
  • Business leaders cautioned against over-reliance on AI for complex judgments
  • Human verification remains essential despite AI's convincing presentation

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

When AI Can't Agree: Actor's Simple Question Stumps Five Tech Giants

Actor Liu Meihan found herself in a linguistic pickle when five major AI tools couldn't agree on how to pronounce 'Zhu Mi Fang'. The digital assistants offered conflicting answers, with even the same app giving different results on separate devices. This amusing incident highlights the surprising inconsistencies in what we assume are infallible tech helpers. Ultimately, good old-fashioned dictionary settled the debate - proving sometimes human references still outsmart artificial intelligence.

March 2, 2026
AI limitationslanguage technologyChinese pronunciation
News

AI's Surprising Struggle: Why Six-Year-Olds Outsmart Top Models

A groundbreaking study reveals that even advanced AI models like Gemini 3 Pro Preview fall short of a six-year-old's visual reasoning skills. Researchers identified four key areas where silicon brains stumble, from missing fine details to struggling with spatial imagination. This challenges our assumptions about AI superiority and raises questions about the future of embodied intelligence.

January 23, 2026
AI limitationsvisual reasoningcognitive development
News

AI Models Stumble Over Simple Calendar Question

In a surprising turn of events, leading AI models including Google's AI Overviews, ChatGPT, and Claude struggled with basic calendar logic when asked whether 2027 is next year. While some corrected themselves mid-conversation, the initial errors revealed unexpected gaps in these systems' understanding of time and sequence. Only Google's Gemini 3 answered correctly, highlighting ongoing challenges with AI reasoning capabilities.

January 19, 2026
AI limitationsmachine learningtechnology fails
AI Social Posts Still Lack the Human Touch, Study Finds
News

AI Social Posts Still Lack the Human Touch, Study Finds

New research reveals AI-generated social media content remains surprisingly easy to spot. Humans can identify machine-written posts with 70-80% accuracy, largely because AI struggles with emotional expression and spontaneity. The study tested nine major language models across platforms like Reddit and X, finding untuned models sometimes performed better by avoiding overly mechanical outputs.

November 10, 2025
AI limitationssocial media researchhuman-AI interaction
StepClaw AI Desktop Assistant Debuts with One-Click Simplicity
News

StepClaw AI Desktop Assistant Debuts with One-Click Simplicity

StepZen has launched its StepClaw desktop AI assistant, bringing powerful artificial intelligence to users' computers with unprecedented ease. The new software eliminates technical hurdles with one-click installation while offering deep personalization options - from visual skins to personality settings. With robust local data security and access to thousands of applications, StepClaw aims to become your always-available digital work companion.

March 19, 2026
AI assistantdesktop softwareproductivity tools
Lenovo's New AI Claw and Tablet Promise Simpler Tech for Everyone
News

Lenovo's New AI Claw and Tablet Promise Simpler Tech for Everyone

Lenovo just unveiled its Tianxi AI Claw alongside the new Pad Pro 13 tablet, aiming to make AI tools more accessible than ever. The AI Claw skips complicated setups with pre-loaded functions for work, study and play, while the tablet helps students automatically organize notes. Together, they show how tech companies are working to bring powerful AI into our daily routines without the usual tech headaches.

March 19, 2026
LenovoAI toolseducation technology