Skip to main content

Tsinghua University Unveils AutoDroid-V2 for Mobile AI Control

Tsinghua University's AutoDroid-V2 Launch

On December 24, 2024, Tsinghua University’s Intelligent Industry Research Institute (AIR) introduced AutoDroid-V2, a groundbreaking AI model aimed at optimizing automation control for mobile devices. This new model significantly enhances user efficiency by allowing commands to be executed through natural language, leveraging the capabilities of small language models.

Innovations in AI Automation

Unlike traditional systems that depend on large cloud-based language models (LLMs), AutoDroid-V2 utilizes a script-based approach. This innovative strategy enables mobile devices to execute user commands more effectively, reducing reliance on cloud services and thereby enhancing privacy and security. Furthermore, it decreases data consumption for users and lowers operational costs for servers, facilitating broader adoption of mobile devices.

image

Background and Development

The recent advancements in large language models and visual language models have paved the way for controlling mobile devices via natural language commands. These technologies provide novel solutions for addressing complex user tasks. However, conventional methods, such as the "step-by-step GUI agent" approach, often encounter issues related to high data consumption and privacy concerns, hindering their large-scale implementation.

The key innovation of AutoDroid-V2 lies in its ability to generate multi-step scripts directly from user commands. This allows the model to carry out several GUI operations simultaneously, leading to a significant reduction in query frequency and resource consumption. It also enables the generation and execution of task scripts directly on the user’s device, with the model able to create application documentation in offline mode, setting the stage for subsequent script generation.

Performance Testing Results

In performance evaluations, AutoDroid-V2 was benchmarked against 226 tasks across 23 mobile applications. The model demonstrated a task completion rate improvement ranging from 10.5% to 51.7% compared to its predecessors, including AutoDroid and SeeClick. Additionally, it reduced input and output token consumption to 1/43.5 and 1/5.8, respectively, while the model inference latency decreased dramatically to between 1/5.7 and 1/13.4 of the previous models. These findings underscore the efficiency and reliability of AutoDroid-V2 in practical applications.

Implications for the Future

The launch of AutoDroid-V2 represents a significant advancement in the field of AI and mobile technology. By improving the efficiency of natural language commands and reducing dependence on cloud infrastructure, Tsinghua University is setting a new standard for mobile device automation. This innovation not only enhances user experience but also addresses critical issues surrounding data privacy and operational efficiency.

Key Points

  1. AutoDroid-V2 is a new AI model launched by Tsinghua University, enhancing the efficiency of natural language control for mobile devices.
  2. The model reduces dependence on cloud services through small language models, enhancing user privacy and security.
  3. Benchmark tests show significant improvements in task completion rates and resource consumption for AutoDroid-V2, showcasing its strong application potential.

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Nano-Banana AI Model Surpasses FLUX Kontext in Image Editing
News

Nano-Banana AI Model Surpasses FLUX Kontext in Image Editing

The newly introduced Nano-Banana AI model has demonstrated superior image editing capabilities, outperforming the established FLUX Kontext in character reproduction, scene reconstruction, and image fusion. Early user feedback highlights its potential in creative industries.

August 14, 2025
Nano-BananaAI ModelImage Editing
China Clears First Brain-Computer Implant for Paralysis Patients
News

China Clears First Brain-Computer Implant for Paralysis Patients

Chinese regulators have approved the world's first invasive brain-computer interface device for clinical use. Developed by BioSensory Technology, the implant helps quadriplegic patients regain hand function by interpreting brain signals. The system combines cutting-edge neural technology with practical rehabilitation tools, offering new hope for those with cervical spinal cord injuries.

March 13, 2026
medical innovationneurotechnologyspinal cord injury
Edge Gets Smarter: New AI Tool Summarizes Web Pages Instantly
News

Edge Gets Smarter: New AI Tool Summarizes Web Pages Instantly

A new browser extension called AI Page Summarizer has landed in the Microsoft Edge Store, bringing powerful AI summarization capabilities to your fingertips. What makes it special? It works seamlessly with both cloud-based models like DeepSeek and Doubao, and local AI models through Ollama integration - perfect for offline use or privacy-conscious users. The tool goes beyond simple summaries, offering interactive questioning and automatic text handling for longer articles.

March 13, 2026
AI toolsMicrosoft Edgeproductivity
xAI's Grok 4.20 Bets on Honesty Over Hype
News

xAI's Grok 4.20 Bets on Honesty Over Hype

While competitors chase benchmark scores, Elon Musk's xAI takes a different path with Grok 4.20. The new model shines where others stumble - telling the truth. Independent tests show Grok achieves record-low hallucination rates and refreshing honesty when it doesn't know answers. With three specialized modes and competitive pricing, xAI positions Grok as the reliable choice for businesses tired of AI 'making stuff up.'

March 13, 2026
xAIGrokAI reliability
Claude's New Plugins Bridge Excel and PowerPoint for Smarter Workflows
News

Claude's New Plugins Bridge Excel and PowerPoint for Smarter Workflows

Anthropic's latest update for Claude brings game-changing integration between Excel and PowerPoint. The AI assistant can now remember your work across both applications, automatically transferring data and analysis between spreadsheets and presentations. New 'Skills' let users save complex workflows as reusable shortcuts, while enterprise features ensure security for professional environments. Though still limited to open files, Claude is evolving from a simple chatbot into a cross-platform productivity powerhouse.

March 13, 2026
AI productivityOffice automationClaude AI
OpenAI's Sora Video Tool Gets Major Upgrade
News

OpenAI's Sora Video Tool Gets Major Upgrade

OpenAI has rolled out significant improvements to its Sora video generation API, solving key challenges creators face. The updates bring better character consistency across scenes, longer 20-second clips, and simultaneous landscape/portrait outputs - streamlining production workflows dramatically.

March 13, 2026
AI video generationOpenAI updatescontent creation tools