Skip to main content

Alibaba Unveils FunAudio-ASR with Breakthrough Noise Reduction

Alibaba's FunAudio-ASR Redefines Speech Recognition Standards

Alibaba Group's TONGYI Lab has introduced FunAudio-ASR, an end-to-end speech recognition model that dramatically improves accuracy in noisy environments through its innovative Context module. This technological advancement reduces hallucination rates from 78.5% to just 10.7% - a nearly 70% improvement that establishes new benchmarks for the industry.

Image

Technical Breakthroughs

The model was trained on tens of millions of hours of audio data and integrates large language models' semantic understanding capabilities. Testing shows superior performance compared to competitors like Seed-ASR and KimiAudio-8B in challenging scenarios including:

  • Far-field audio capture
  • High-noise environments
  • Multi-speaker situations

The system demonstrates particular effectiveness in business applications such as meetings and public spaces where background noise traditionally degrades recognition quality.

Deployment Options

Recognizing diverse user needs, Alibaba offers:

  1. Full version: Maximum accuracy for enterprise applications
  2. FunAudio-ASR-nano: Lightweight version maintaining core functionality while reducing computational requirements

The nano variant enables cost-effective deployment across various hardware configurations without significant performance compromises.

Image

Current Implementations

The technology already powers several real-world applications:

  • DingTalk's "AI Note-taking" feature
  • Video conferencing systems
  • DingTalk A1 hardware devices Developers can access the API through Alibaba Cloud's BaiLian platform, facilitating seamless integration into existing systems.

Industry Impact

The launch represents a significant leap forward for:

  • Business communication tools
  • Accessibility technologies
  • AI-powered transcription services By dramatically improving reliability in noisy conditions, FunAudio-ASR removes a major barrier to widespread speech recognition adoption.

Key Points:

  • 70% reduction in hallucination rates compared to previous solutions The Context module enables unprecedented accuracy improvements Dual deployment options accommodate different resource requirements Already implemented across Alibaba's business communication ecosystem API availability accelerates third-party adoption

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Tongyi Qianwen Unveils Qwen3-ASR-Flash Speech Recognition Model
News

Tongyi Qianwen Unveils Qwen3-ASR-Flash Speech Recognition Model

Tongyi Qianwen has launched Qwen3-ASR-Flash, a cutting-edge speech recognition model with multilingual support, singing recognition capabilities, and customizable context adaptation. The model achieves under 8% error rate in tests and supports 11 languages across various dialects.

September 9, 2025
speech-recognitionAI-technologymultilingual-processing
Step-Audio-R1.1 Shatters Records as New Speech AI Champion
News

Step-Audio-R1.1 Shatters Records as New Speech AI Champion

StepZen Star's open-source speech model Step-Audio-R1.1 has outperformed tech giants' offerings, achieving a record-breaking 96.4% accuracy in global AI evaluations. This innovative model combines human-like reasoning with real-time response capabilities, allowing users to think and speak simultaneously through streaming inference. Developers can already experiment with its groundbreaking technology via HuggingFace.

January 15, 2026
speech-recognitionAI-breakthroughopen-source-tech
Alibaba's Fun-ASR Model Boosts Speech Recognition by 15%
News

Alibaba's Fun-ASR Model Boosts Speech Recognition by 15%

Alibaba's Tongyi has upgraded its Fun-ASR speech recognition model, achieving over 15% accuracy improvements in vertical industries like insurance and home decoration. The model leverages advanced algorithms and reinforcement learning to enhance context awareness and reduce errors in noisy environments.

August 23, 2025
speech-recognitionAI-modelsAlibaba-Tongyi
NVIDIA's Canary-Qwen-2.5B Sets New Speech Recognition Benchmark
News

NVIDIA's Canary-Qwen-2.5B Sets New Speech Recognition Benchmark

NVIDIA has launched Canary-Qwen-2.5B, a hybrid speech recognition and language model achieving a record-low 5.63% word error rate. The commercial-grade model combines ASR with LLM capabilities, offering unprecedented accuracy and speed for enterprise applications while being available under an open CC-BY license.

July 18, 2025
speech-recognitionAI-modelsNVIDIA
News

Kyutai Labs Open-Sources Real-Time Voice Synthesis Tech

Kyutai Labs has open-sourced its Kyutai TTS technology, offering low-latency, high-fidelity real-time voice synthesis. The system supports streaming text input and generates precise word timestamps, making it ideal for interactive applications. Currently supporting English and French, it achieves high accuracy with WER rates below 3.3%.

July 4, 2025
voice-synthesisAI-technologyopen-source
Meta and Luxury Brands Unveil AI-Powered Smart Glasses
News

Meta and Luxury Brands Unveil AI-Powered Smart Glasses

Meta has partnered with Prada, Oakley, and other luxury brands to launch next-generation smart glasses featuring advanced AI technology. Priced at $360, the glasses target sports enthusiasts with enhanced durability and functionality. A third-generation model with additional features is expected by year-end.

June 18, 2025
smart-glassesAI-technologyluxury-brands