vivo's BlueLM-2.5-3B: A New Era in Multimodal AI

July 10, 2025 — vivo AI Lab has introduced BlueLM-2.5-3B, its latest edge-side multimodal model, marking a significant leap in artificial intelligence's ability to process both text and images. This compact yet highly efficient model is designed to understand graphical user interfaces (GUIs), a capability that sets it apart from many competitors.

Key Features of BlueLM-2.5-3B

The model's standout feature is its ability to switch between short and long thinking modes, coupled with a thinking budget control mechanism. This innovation allows the AI to balance depth and efficiency in reasoning, making it particularly effective in tasks requiring complex analysis.

Performance Metrics

In over 20 evaluations, BlueLM-2.5-3B demonstrated exceptional text processing capabilities, effectively addressing the common "forgetting problem" seen in multimodal models. Under the long thinking mode, the model outperformed similar-scale models in mathematical and logical reasoning tasks. Its multimodal understanding was equally impressive, rivaling even larger models.

GUI Understanding: A Game-Changer

One of the most notable achievements of BlueLM-2.5-3B is its proficiency in understanding GUIs. Trained on a vast dataset of Chinese application screenshots, the model scored higher than many competitors in this domain. This advancement underscores vivo's growing expertise in AI technology.

Efficiency and Cost-Effectiveness

Despite its advanced capabilities, BlueLM-2.5-3B boasts only 2.9 billion parameters, making it relatively lightweight. The model's optimized data utilization strategies and efficient training processes have significantly reduced both training and inference costs, paving the way for broader AI adoption.

Future Implications

The release of BlueLM-2.5-3B not only enhances user experiences with smarter applications but also propels the field of artificial intelligence forward. Its blend of performance and efficiency makes it a formidable player in the AI landscape.

Key Points

Adaptive Thinking Modes: Balances depth and efficiency in reasoning.
Superior GUI Understanding: Outperforms competitors in processing graphical interfaces.
Cost-Effective: Low training and inference costs due to optimized strategies.
Compact Design: Only 2.9 billion parameters, yet highly capable.

AI D-A-M-N

vivo Unveils BlueLM-2.5-3B: A Breakthrough in AI GUI Understanding