Skip to main content

Alibaba's New Algorithm Helps AI Think More Like Humans

Alibaba's Tongyi Lab Breaks New Ground in AI Reasoning

Researchers at Alibaba's Tongyi Lab have developed an innovative algorithm that could change how artificial intelligence handles complex reasoning tasks. The new approach, called FIPO (Future-KL Influenced Policy Optimization), addresses a fundamental challenge in large language models: identifying which pieces of information truly matter when working through multi-step problems.

Image

The Reasoning Bottleneck

Current reinforcement learning methods often treat every piece of information equally when processing long chains of reasoning. "Imagine trying to solve a math problem where you can't tell which numbers actually affect the final answer," explains one researcher familiar with the project. "That's essentially the challenge these models face."

The FIPO algorithm introduces what the team calls a "Future-KL" mechanism. This clever approach specifically rewards tokens (the basic units of information in AI systems) that prove crucial for subsequent reasoning steps. It's like giving bonus points to the parts of a calculation that actually lead to the solution, rather than treating all steps as equally important.

Image

Real-World Performance

In practical testing, FIPO has shown remarkable results. When applied to Alibaba's Qwen2.5-32B-Base model, it achieved average reasoning lengths exceeding 10,000 tokens - a significant leap forward. More importantly, it didn't just handle longer reasoning chains; it did so more accurately, particularly in complex mathematical problems.

The algorithm outperformed comparable models like o1-mini and DeepSeek-Zero-MATH in pure reinforcement learning setups. What makes these results particularly interesting is how they were achieved: by focusing on what researchers call "the directionality of optimization" - essentially teaching the AI to recognize which paths through a problem actually lead to solutions.

Why This Matters

Most tokens in traditional AI training show little change before and after learning sessions - what researchers describe as "extremely sparse" impact. Common evaluation metrics often miss subtle but crucial changes in key tokens. FIPO introduces a new way to measure progress through something called Δlog p (difference in log probability of symbol pairs), giving developers better visibility into how their models are learning.

This breakthrough comes at a time when AI systems are increasingly being asked to handle complex, multi-step reasoning tasks - from scientific research to financial analysis. The ability to distinguish critical from incidental information could be key to developing more reliable and capable AI assistants.

Key Points:

  • Smarter Focus: FIPO helps AI identify and prioritize the most important information in reasoning tasks
  • Longer Reasoning: Enables handling of reasoning chains over 10,000 tokens long
  • Better Accuracy: Shows significant improvements in complex mathematical problem-solving
  • New Measurement: Introduces Δlog p as a more effective way to track learning progress

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Alibaba's AI Model Hits Trillion Token Milestone, Tops Global Rankings

Alibaba's Qwen 3.6 Plus has made history by becoming the first AI model to surpass 10 trillion tokens in daily usage on OpenRouter, securing the top spot in global rankings. This achievement signals China's growing influence in the AI landscape, with domestic models gaining traction through competitive pricing and rapid innovation. Meanwhile, the capital market shows strong interest in AI technologies, with trading volumes hitting 1 trillion yuan on Chinese exchanges.

April 7, 2026
Artificial IntelligenceAlibabaOpenRouter
Google's Gemma 4: Small AI Models Pack a Big Punch
News

Google's Gemma 4: Small AI Models Pack a Big Punch

Google has open-sourced its Gemma 4 AI models, and they're turning heads in the tech world. What makes them special? Some of these compact models outperform giants 20 times their size, bringing powerful AI capabilities to everyday devices like smartphones. With optimized versions for mobile and IoT devices, Gemma 4 could change how we interact with AI in our daily lives.

April 7, 2026
AIMachine LearningGoogle
News

Google's Gemma 4: A Powerhouse AI Model Set to Shake Up Open-Source Landscape

Google is gearing up to unveil Gemma 4, its next-generation open-source AI model that promises four times the parameters of its predecessor. With a rumored 120 billion parameters and innovative MoE architecture, this release marks Google's strategic move to reclaim influence in the open-source AI space. The tech world watches closely as this development could redefine the balance between commercial and open-source AI models.

April 2, 2026
AI DevelopmentOpen Source TechMachine Learning
News

Alibaba and Shanghai AI Lab Tackle AI Safety in New White Paper

As AI evolves from chatbots to autonomous agents, safety concerns take center stage. Alibaba and Shanghai Artificial Intelligence Laboratory have teamed up to release a groundbreaking white paper addressing these risks. The document outlines a three-pronged approach focusing on corporate responsibility, social benefit, and industry collaboration. This comes as China's tech sector shifts its focus from raw computing power to responsible AI development.

April 1, 2026
AI SafetyAlibabaShanghai AI Lab
Alibaba's New AI Image Model Brings Hyper-Realistic Faces and More
News

Alibaba's New AI Image Model Brings Hyper-Realistic Faces and More

Alibaba has unveiled Wan2.7-Image, a groundbreaking AI model that revolutionizes image generation. Gone are the days of generic 'AI faces' - this technology enables pixel-perfect facial customization down to bone structure and eye shape. It also masters artistic color transfer and can generate print-quality documents with complex formatting. With interactive editing features and multi-subject consistency, this tool is set to transform industries from e-commerce to entertainment.

April 1, 2026
AI image generationAlibabadigital content creation
ClawHub's China Mirror Site Goes Live - AI Developers Rejoice!
News

ClawHub's China Mirror Site Goes Live - AI Developers Rejoice!

ClawHub, the popular 'npm for AI Agents,' has launched its official Chinese mirror site, bringing faster access and better stability for domestic developers. The new mirror at https://mirror-cn.clawhub.com solves previous network latency issues, making it easier than ever to share and discover AI skills. Sponsored by ByteDance's VolcanoEngine, this move signals growing localization in the AI Agent ecosystem.

April 1, 2026
AI DevelopmentOpen SourceMachine Learning