Skip to main content

New Benchmark Aims to Make AI Phone Calls Sound More Human

AI Phone Calls Get a Reality Check with New Evaluation Standard

For years, companies using AI for customer calls have faced a frustrating problem: how do you measure whether these digital agents actually sound natural? Now, tech firm Agora and food delivery giant Meituan have developed what they believe is the solution - VoiceAgentEval, the first comprehensive benchmark for evaluating AI outbound calls.

Moving Beyond the Lab

Unlike traditional tests that rely on scripted interactions in controlled environments, VoiceAgentEval throws AI into realistic business situations. "We wanted to create something that reflects what actually happens when people pick up the phone," explains one developer involved in the project.

The system evaluates performance across six major business areas divided into 30 specific scenarios. It analyzes not just whether the AI follows logical conversation paths, but how natural it sounds doing so - a crucial factor that previous standards often overlooked.

Putting AI Through Its Paces

To thoroughly test these digital callers, developers built 150 different dialogue simulations. Imagine giving an AI 150 pop quizzes where each one presents unique challenges - that's essentially what VoiceAgentEval does. The system checks how well the technology can:

  • Stay on track with its intended purpose
  • Handle unexpected user responses
  • Maintain smooth conversation flow
  • Deliver information clearly and naturally

The benchmark has already identified three top-performing models through preliminary testing. While the companies behind these models haven't been officially named yet, industry insiders suggest Beijing San Kuai Technology is among the leaders.

Why This Matters for Businesses

For companies considering AI call solutions, this new standard provides something invaluable: apples-to-apples comparisons between different systems. No more guessing which solution will perform best in real-world conditions.

The restaurant booking industry offers a perfect example. When an AI calls to confirm reservations, it needs to handle everything from simple "yes" responses to complex questions about menu changes or parking availability. VoiceAgentEval tests all these scenarios and more.

As one restaurant chain manager noted: "We've tried three different call systems this year alone. Having an objective way to compare them before we commit would save us thousands in implementation costs."

What's Next?

The team behind VoiceAgentEval plans regular updates to keep pace with evolving technology and business needs. Future versions may incorporate regional dialect recognition and even emotional intelligence metrics.

For now though, the focus remains on establishing this benchmark as the gold standard for an industry that's rapidly moving from experimental to essential.

Key Points:

  • First industry standard for evaluating AI outbound calls
  • Tests real-world performance across 30 business scenarios
  • Evaluates both conversation logic and voice quality
  • Uses 150 simulated dialogues to thoroughly test AI systems
  • Already identified top-performing models in initial testing

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Xiaohongshu's New AI Video Editor Lets You Chat Your Way to Creativity

China's popular social platform Xiaohongshu is testing an innovative AI video editing tool called OpenStoryline that could revolutionize content creation. The conversational interface allows users to edit videos through natural language commands, potentially making professional editing accessible to everyone. While still in early testing, the company hints this game-changing tool might eventually go open-source.

February 9, 2026
AI video editingXiaohongshuOpenStoryline
AI Showdown: Claude's Big Leap, Qwen's Red Envelope Rush & Tencent's Manga Move
News

AI Showdown: Claude's Big Leap, Qwen's Red Envelope Rush & Tencent's Manga Move

Today's AI landscape sees major players making bold moves. Anthropic pushes boundaries with Claude Opus 4.6's massive context window, while Alibaba Qwen battles server crashes amid its wildly popular Spring Festival promotion. Meanwhile, Tencent enters the animated manga arena with Huolong Webtoon, and regulators crack down on AI copycats. From digital employees to automated anime production, these developments showcase AI's rapid evolution across industries.

February 6, 2026
AI innovationtech regulationdigital transformation
News

Baidu's Digital Workforce Hits 1.3 Million as AI Agents Go Mainstream

Baidu's Qianfan platform has reached a significant milestone, powering over 1.3 million AI agents across industries. These digital workers are no longer experimental - they're handling millions of daily tasks in finance, manufacturing, and retail. With new model integrations and predictions of autonomous 'digital employees' by 2026, Baidu is leading China's AI commercialization race.

February 6, 2026
AI adoptionenterprise technologydigital transformation
News

China Telecom Spearheads AI Revolution Across Industries

China Telecom is leading the charge in implementing AI across diverse sectors, from urban management to industrial production. Partnering with other telecom giants, they've launched a massive computing project to fuel AI development. Government officials highlight how these efforts boost efficiency while driving economic growth through technological innovation.

February 4, 2026
AI innovationdigital transformationChina Telecom
News

Kingsoft's AI Office Solution Boosts Shanghai Businesses by 80%

Kingsoft Office has successfully rolled out its 'Enterprise Brain' AI solution in Shanghai, delivering dramatic efficiency gains for local businesses. The WPS365-powered system has slashed document processing times by 80% while improving customer service response rates threefold. From finance to shipbuilding, companies are seeing real benefits from this knowledge-enhanced approach to office automation.

January 28, 2026
AI productivityenterprise softwaredigital transformation
News

Baidu's Wenxin App Tests AI Group Chats That Feel Like Human Conversations

Baidu's Wenxin app is breaking new ground with a beta test of multi-agent group chats, where different AI assistants can join conversations naturally. Imagine chatting with friends while specialized AIs chime in with health advice or travel tips at just the right moment. This innovation moves beyond simple question-and-answer interactions, creating more dynamic digital discussions.

January 27, 2026
AI chatbotsconversational AIdigital assistants