Ali WebShaper Sets New Benchmark in AI Information Retrieval

Alibaba's Tongyi Lab has unveiled WebShaper, the fourth installment in its WebAgent series, introducing a revolutionary "formal-driven" approach to information retrieval. This framework has demonstrated superior performance on the GAIA benchmark, scoring 60.19—outpacing industry leaders Claude 3.5 Sonnet and GPT-4o.

A Paradigm Shift: From Information-Driven to Formal-Driven

Traditional information retrieval methods often struggle with misaligned structures and limited knowledge coverage. WebShaper addresses these challenges through its innovative formal-driven paradigm, which systematically formalizes tasks to redefine data generation and model training.

The framework employs an Agentic Expander to iteratively generate and verify questions, ensuring high semantic consistency between knowledge and reasoning structures. This method not only improves data quality but also significantly boosts performance in complex retrieval tasks.

Impressive Benchmark Results

WebShaper's achievements extend beyond GAIA. It also scored 52.50 on the WebWalkerQA benchmark, showcasing its strength in web traversal and information retrieval. These results position WebShaper as a new benchmark for open-source models.

Innovative Dataset Generation

At its core, WebShaper introduces a logic-driven training paradigm. Unlike traditional methods, it systematically generates structured training data through formalization, ensuring semantic consistency. The SailorFog-QA dataset, created using graph sampling and information blurring techniques, exemplifies this approach.

Community-Driven Evolution

As part of the broader WebAgent ecosystem, which includes tools like WebWalker and WebDancer, WebShaper promotes community innovation through open-source access. The project has already garnered over 4,000 stars on GitHub, reflecting strong developer interest.

Future Prospects

Tongyi Lab plans to expand WebAgent's capabilities, including enhanced multimodal processing and broader language support. Social media feedback highlights WebShaper's excellence in multi-step reasoning and cross-modal understanding.

Key Points:

WebShaper introduces a formal-driven paradigm for information retrieval
Scores 60.19 on GAIA benchmark, outperforming Claude 3.5 Sonnet and GPT-4o
Innovative dataset generation ensures semantic consistency
Part of Alibaba's growing WebAgent ecosystem
Open-source nature fosters community-driven development

Project Address: https://github.com/Alibaba-NLP/WebAgent

AI D-A-M-N

Ali WebShaper Outperforms Leading AI Models in Benchmark Tests