Alibaba Unveils Next-Gen GUI Automation Tools
Alibaba's Qwen Team Introduces Breakthrough GUI Automation Solutions
September 1, 2025 - Alibaba's Qwen research team has unveiled two groundbreaking products in the field of graphical user interface (GUI) automation: Mobile-Agent-v3 and GUI-Owl. These innovations aim to overcome longstanding challenges in automating interactions with modern computing interfaces.
The Challenge of GUI Automation
While graphical interfaces dominate modern computing, existing automation methods have relied heavily on complex scripts and manual rules with limited effectiveness. Traditional approaches often struggle with the dynamic nature of real-world applications and varying screen layouts.

Introducing GUI-Owl: A Multimodal Solution
The GUI-Owl model represents a significant leap forward in interface automation technology. Built upon Alibaba's Qwen2.5-VL foundation, this multimodal agent incorporates extensive training on GUI interaction data to enhance both task comprehension and execution capabilities.
Key features include:
- Integrated perception, reasoning, planning, and execution functions
- Unified policy network for consistent decision-making
- Clear reasoning processes visible during operation
- Adaptability to real-world application changes
The development team created a sophisticated self-evolving data production pipeline to ensure high-quality training material. This system generates realistic application navigation workflows that undergo human validation before being incorporated into the model's training regimen.

Mobile-Agent-v3: Multi-Agent Collaboration Framework
The companion Mobile-Agent-v3 framework introduces an innovative approach to complex task automation through specialized agent collaboration:
- Manager Agent: Oversees task decomposition and coordination
- Worker Agent: Handles direct interface interactions
- Reflection Agent: Analyzes execution results for improvements
- Note Agent: Maintains context across operations
This architecture enables dynamic plan updates based on execution feedback, significantly improving success rates for complex workflows.
Performance and Applications
Early benchmark testing demonstrates exceptional performance across multiple GUI automation challenges, particularly in cross-platform scenarios. Potential applications span:
- Enterprise software automation
- Mobile app testing frameworks
- Accessibility technology enhancements
- Robotic process automation systems
The team has made their research publicly available through a technical paper and open-sourced components on GitHub.
Key Points:
- 🚀 GUI-Owl combines multimodal perception with adaptive reasoning for robust GUI interaction
- 🤖 Mobile-Agent-v3's specialized agents enable complex task decomposition and dynamic planning
- 📈 Both solutions demonstrate superior performance in benchmark testing compared to existing methods
- 🔍 Alibaba's self-evolving data pipeline ensures continuous improvement capability
- 🌐 Open-source availability promotes wider adoption and community development





