Skip to main content

Shanghai Researchers Boost AI Reflection Capabilities

Shanghai Team Advances AI Reasoning Capabilities

Researchers from Shanghai Jiao Tong University and the Shanghai Artificial Intelligence Laboratory have made significant progress in enhancing the reflective abilities of multimodal large models (MLLMs). Their innovative MM-HELIX project addresses a critical limitation in current AI systems - the inability to effectively backtrack and reconsider approaches when facing complex challenges.

The Reflection Challenge in AI

While MLLMs demonstrate impressive capabilities in solving complex problems, they often exhibit "rigid" behavior during reasoning processes. Unlike humans who can reflect on their approach after encountering obstacles, current models struggle with this metacognitive ability. This limitation becomes particularly evident when handling tasks requiring multiple solution attempts or adaptive strategies.

Image

Building MM-HELIX: A Comprehensive Solution

The research team took a three-pronged approach:

  1. The Ultimate Exam Benchmark: Developed to evaluate reflective reasoning across 42 highly complex tasks spanning algorithms, graph theory, puzzles, and strategy games.
  2. MM-HELIX-100K Dataset: Contains 100,000 high-quality samples teaching models reflection through "step-by-step heuristic response generation" (SERG).
  3. Adaptive Hybrid Policy Optimization (AHPO): An intelligent tutoring algorithm that gradually shifts models from expert guidance to independent exploration.

The benchmark tests revealed even state-of-the-art models performed poorly on reflective tasks, particularly under multimodal input conditions.

Image

Measurable Improvements

The implementation showed promising results:

  • The SERG process reduced problem-solving time significantly while minimizing redundant thinking
  • Models equipped with MM-HELIX demonstrated stronger generalization capabilities
  • The Qwen2.5-VL-7B model achieved an 18.6% accuracy increase on benchmark tests

Key Points:

  • Current MLLMs lack effective reflection capabilities for complex reasoning tasks
  • MM-HELIX provides tools for evaluation (benchmark), training (dataset), and optimization (algorithm)
  • The system mimics human learning progression from guided to independent problem-solving
  • Demonstrated performance improvements validate the approach's effectiveness

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Robots Get a Sense of Touch with Groundbreaking New Dataset

A major leap forward in robotics arrived this week with the release of Baihu-VTouch, the world's first cross-body visual-tactile dataset. Developed collaboratively by China's National-Local Co-built Humanoid Robot Innovation Center and multiple research teams, this treasure trove contains over 60,000 minutes of real robot interaction data. What makes it special? The dataset captures not just what robots see, but how objects feel - enabling machines to develop human-like tactile sensitivity across different hardware platforms.

January 27, 2026
roboticsAI researchtactile sensing
Robots Get a Sense of Touch: Groundbreaking Dataset Bridges Vision and Feeling
News

Robots Get a Sense of Touch: Groundbreaking Dataset Bridges Vision and Feeling

Scientists have unveiled Baihu-VTouch, the world's most comprehensive dataset combining robotic vision and touch. This collection spans over 60,000 minutes of interactions across various robot types, capturing delicate contact details with remarkable precision. The breakthrough could revolutionize how robots handle delicate tasks - imagine machines that can actually 'feel' what they're doing.

January 26, 2026
roboticsAI researchtactile sensors
News

AI cracks famous math puzzle with a fresh approach

OpenAI's latest model has made waves in mathematics by solving a long-standing number theory problem. The solution to the Erdős problem caught the attention of Fields Medalist Terence Tao, who praised its originality. But behind this success lies a sobering reality - AI's overall success rate in solving such problems remains low, reminding us that these tools are assistants rather than replacements for human mathematicians.

January 19, 2026
AI researchmathematicsmachine learning
AI's Scientific Breakthrough: How FrontierScience Tests the Next Generation of Research Assistants
News

AI's Scientific Breakthrough: How FrontierScience Tests the Next Generation of Research Assistants

Artificial intelligence is making waves in scientific research, but how do we measure its true reasoning capabilities? The new FrontierScience benchmark puts AI models through rigorous testing in physics, chemistry, and biology. Early results show GPT-5.2 leading the pack, though human scientists still outperform when it comes to open-ended problem solving. This development could reshape how research gets done in labs worldwide.

December 17, 2025
AI researchscientific computingmachine learning benchmarks
AI2's Molmo 2 Brings Open-Source Video Intelligence to Your Fingertips
News

AI2's Molmo 2 Brings Open-Source Video Intelligence to Your Fingertips

The Allen Institute for AI has just unveiled Molmo 2, a game-changing open-source video language model that puts powerful visual understanding tools directly in developers' hands. With versions ranging from 4B to 8B parameters, these lightweight yet capable models can analyze videos, track objects, and even explain what's happening on screen. What makes this release special? Complete transparency - you get full access to both the models and their training data, a rare find in today's proprietary AI landscape.

December 17, 2025
AI researchcomputer visionopen source AI
Alibaba's New AI Training Method Promises More Stable, Powerful Language Models
News

Alibaba's New AI Training Method Promises More Stable, Powerful Language Models

Alibaba's Tongyi Qwen team has unveiled an innovative reinforcement learning technique called SAPO that tackles stability issues in large language model training. Unlike traditional methods that risk losing valuable learning signals, SAPO uses a smarter approach to preserve important gradients while maintaining stability. Early tests show significant improvements across various AI tasks, from coding to complex reasoning.

December 10, 2025
AI researchmachine learningAlibaba