SJTU and SII Unveil Open-Source AI Agent with 241% Performance Boost
In a significant leap for AI-powered computer agents, researchers from Shanghai Jiao Tong University (SJTU) and SII have developed PC Agent-E, a new open-source model that demonstrates remarkable efficiency gains. The breakthrough comes from an innovative approach that requires only 312 human-labeled operation trajectories yet delivers a staggering 241% performance improvement, surpassing established models like Claude3.7Sonnet in Windows environments.
The development challenges conventional wisdom in AI training. While industry leaders like Anthropic and OpenAI have relied on massive datasets and complex reinforcement learning algorithms, the SJTU-SII team proved that quality trumps quantity. Their secret? A carefully curated collection of real user interactions captured through their proprietary PC Tracker tool, which records every keystroke, mouse movement, and screen change with precise context.
What makes this dataset special goes beyond its compact size. The researchers performed "chain-of-thought completion" for each trajectory, adding the reasoning behind every action. This enrichment transforms simple activity logs into intelligent training material that teaches the AI not just what to do, but why to do it.
The team then applied trajectory enhancement techniques using Claude3.7Sonnet itself to generate multiple plausible action sequences for each step. This synthetic expansion created richer training scenarios without requiring additional human input. When tested on the WindowsAgentArena-V2 benchmark, PC Agent-E outperformed even Claude3.7Sonnet's advanced "extended thinking" mode.
This research opens new possibilities for efficient AI development. By demonstrating that small but meticulously prepared datasets can outperform brute-force approaches, the work could accelerate innovation while reducing computational costs. The team has made all resources publicly available, including:
Key Points
- PC Agent-E achieves 241% better performance than previous methods using just 312 curated trajectories
- The model outperforms Claude3.7Sonnet on Windows systems despite requiring far less training data
- Innovative "chain-of-thought completion" adds reasoning context to each recorded action
- Trajectory enhancement techniques multiply training value without additional human input
- Full open-source release includes models, code, and datasets for community use