Skip to main content

Alibaba's AI Team Unveils Ovis2.5: Breakthrough in Visual Reasoning

Alibaba Advances Multimodal AI with Ovis2.5 Release

The AI Team (AIDC-AI) of Alibaba International Digital Trade Group has introduced Ovis2.5, a cutting-edge multimodal language model available in two configurations: 9B and 2B parameters. This release marks a significant leap in economic visual reasoning solutions, combining compact size with industry-leading performance.

Image

Key Innovations in Ovis2.5

  1. Native Resolution Recognition: Utilizing the NaViT Visual Encoder, Ovis2.5 preserves fine image details without quality loss, enabling superior visual processing capabilities.

  2. Advanced Reasoning Capabilities: The model features a "thinking mode" potentially leveraging Alibaba's Qwen3 technology. Beyond standard chain-of-thought (CoT) reasoning, it supports self-correction and configurable thinking budgets for improved accuracy.

  3. Industry-Leading Document Analysis: Ovis2.5 outperforms competitors in complex diagram interpretation, document understanding (including tables/forms), and optical character recognition (OCR) at both parameter sizes.

  4. Broad Task Competency: Demonstrates strong performance across image reasoning, video understanding, and visual localization benchmarks, showcasing versatile multimodal abilities.

Strategic Impact

The open-source availability on GitHub and Hugging Face positions Ovis2.5 as an accessible solution for developers needing combined visual-textual analysis. Alibaba emphasizes this release as part of their ongoing innovation in multimodal AI technology.

Key Points:

  • Two model sizes (9B/2B parameters) balance performance with efficiency
  • Native resolution handling via NaViT encoder technology
  • Self-correcting reasoning capabilities with configurable thinking budgets
  • Open-source availability accelerates industry adoption

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Microsoft Unveils Phi-4: A Nimble AI That Sees and Thinks Like Humans
News

Microsoft Unveils Phi-4: A Nimble AI That Sees and Thinks Like Humans

Microsoft has introduced Phi-4-Reasoning-Vision-15B, a groundbreaking open-source AI model that combines visual perception with deep reasoning capabilities. Unlike traditional models, Phi-4 actively analyzes images while understanding context, enabling developers to create smarter applications from data analysis to UI automation. Its unique dual-mode operation switches between rapid response and thoughtful analysis as needed.

March 5, 2026
Microsoft AIComputer VisionMultimodal Models
News

Moonshot AI Prepares Major Kimi Upgrade With Multimodal Boost

Moonshot AI is gearing up to launch an enhanced version of its Kimi K2 model in early 2026, promising significant improvements in handling multiple data types. The upgrade, tentatively named K2.1/K2.5, builds on the success of last year's open-source release with better visual and audio processing capabilities. Meanwhile, the company's strong financial position - boasting over 10 billion yuan in reserves - signals fierce competition brewing among China's AI leaders.

January 4, 2026
Artificial IntelligenceMultimodal ModelsMoonshot AI
Jan Team Unveils AI Model That Excels at Complex Tasks
News

Jan Team Unveils AI Model That Excels at Complex Tasks

The Jan team has launched Jan-v2-VL-Max, a 30B parameter multimodal model designed specifically for handling long-term execution tasks. Unlike general-purpose models, it tackles the common AI problem of 'failing' during complex automation processes. With innovative LoRA-based RLVR technology, it reduces error accumulation and hallucinations in multi-step operations. Early tests show it outperforms established models like Gemini2.5Pro in stability benchmarks.

December 24, 2025
AI AutomationMultimodal ModelsMachine Learning
TikTok and LV-NUS Launch Compact SAIL-VL2 Model with Big Impact
News

TikTok and LV-NUS Launch Compact SAIL-VL2 Model with Big Impact

TikTok's SAIL team and LV-NUS Lab have introduced SAIL-VL2, a multimodal AI model that delivers exceptional performance despite its small size. With innovations in architecture, training, and data processing, the model excels in complex reasoning tasks, rivaling larger closed-source counterparts. The open-source release promises broad applications.

October 14, 2025
AI ResearchMultimodal ModelsMachine Learning
News

AI Expert Xu Zuhong Joins Alibaba Tongyi Team

Renowned AI scientist Xu Zuhong has joined Alibaba's Tongyi team to lead multimodal interaction model development. The Stanford top 1% scientist brings decades of academic and industry experience, including roles at Salesforce and his startup HyperAGI. His appointment signals Alibaba's strengthened focus on advanced AI applications.

September 30, 2025
Artificial IntelligenceAlibabaMultimodal Models
Alibaba International Unveils Ovis2.5, Advancing AI Visual and Reasoning Capabilities
News

Alibaba International Unveils Ovis2.5, Advancing AI Visual and Reasoning Capabilities

Alibaba International has launched Ovis2.5, a next-gen multimodal AI model, open-sourced to the public. It excels in visual perception and deep reasoning, with versions optimized for different scales. The model scores 78.3 on OpenCompass, maintaining SOTA status among open-source models under 40B parameters.

August 26, 2025
AIMultimodal ModelsComputer Vision