Tsinghua University Unveils AutoDroid-V2 for Mobile AI Control
Tsinghua University Unveils AutoDroid-V2 for Mobile AI Control
On December 24, 2024, Tsinghua University’s Intelligent Industry Research Institute (AIR) introduced AutoDroid-V2, a groundbreaking AI model aimed at optimizing the automation control capabilities of mobile devices. This new model significantly improves the efficiency of user operations through natural language, leveraging the power of small language models.
Key Innovations
Unlike traditional automation methods that depend on large cloud-based language models (LLMs), AutoDroid-V2 employs a script-based approach. This innovation allows mobile devices to execute user commands directly, reducing reliance on cloud services. As a result, user privacy and security are enhanced, while simultaneously lowering data consumption on the user side and operational costs on the server side. This strategy promotes broader adoption of mobile devices by making them more efficient and user-friendly.
Background of the Project
The rapid development of large language models and visual language models has facilitated the control of mobile devices via natural language commands. These advancements offer new solutions for complex user tasks. However, the conventional “step-by-step GUI agent” approach presents challenges, including high data consumption and privacy concerns, which hinder large-scale deployment.
AutoDroid-V2 addresses these issues by generating multi-step scripts based on user commands, enabling the execution of several GUI operations simultaneously. This capability significantly reduces query frequency and resource consumption while allowing for the direct generation and execution of task scripts on user devices. Furthermore, the model creates application documentation in offline mode, setting the foundation for future script generation.
Performance Testing
In extensive performance tests, AutoDroid-V2 was benchmarked against 226 tasks across 23 mobile applications. The results indicated a task completion rate improvement ranging from 10.5% to 51.7% compared to earlier models such as AutoDroid and SeeClick. Additionally, input and output token consumption saw substantial reductions, measuring at 1/43.5 and 1/5.8, respectively. Model inference latency also improved, decreasing to as low as 1/5.7 to 1/13.4 of previous figures.
These findings highlight the efficiency and reliability of AutoDroid-V2 in practical applications, showcasing its potential for widespread use in enhancing mobile device functionality.
Conclusion
The launch of AutoDroid-V2 marks a significant advancement in the field of AI automation control for mobile devices. By improving user experience through natural language processing and reducing dependency on cloud infrastructure, Tsinghua University is setting new standards for privacy, efficiency, and usability in mobile technology.
Key Points
- AutoDroid-V2 is a new AI model launched by Tsinghua University, enhancing the efficiency of natural language control for mobile devices.
- The model reduces dependence on cloud services through small language models, enhancing user privacy and security.
- Benchmark tests show significant improvements in task completion rates and resource consumption for AutoDroid-V2, showcasing its strong application potential.