Skip to main content

Alibaba's New Voice Tech Lets You Command Sounds Like Magic

Alibaba's Voice Tech Breakthrough: Speak Your Sound Into Existence

Imagine telling your computer "Make this voice sound like a confident professor" or "Create battlefield sounds with distant explosions" - and having it happen instantly. That's the promise of Alibaba Tongyi Lab's newly launched voice generation models, which are turning science fiction into reality.

Image

Your Voice, Your Rules

The team unveiled two specialized tools:

Fun-CosyVoice3.5: The Multilingual Maestro

This upgraded model understands vocal commands like a seasoned actor takes direction:

  • Natural Language Control: Say "slow down and add emotion" and it adjusts instantly
  • Global Reach: Now handles Thai, Indonesian and 11 other languages with impressive accuracy
  • Precision Boost: Reduced obscure character errors by nearly 70%
  • Speed Demon: Cuts first-response delays by 35%, crucial for live interactions

Fun-AudioGen-VD: The Sound Architect

Think of this as your personal Foley artist:

  • Character Creation: Specify age, accent, even "hoarse but cheerful" tones
  • Emotional Depth: Captures subtle states like "calm outside, nervous inside"
  • Immersive Environments: Layers background noise from cafés to cathedrals with spatial effects

The implications are staggering. Podcasters can refine narration without expensive studios. Game developers might prototype character voices during lunch breaks. Film editors could experiment with atmospheric sounds before booking pricey recording sessions.

The Tongyi Lab team emphasizes these tools aim to democratize audio production. As one developer put it: "We're removing the technical barriers so creators can focus on what matters - their vision."

The models are currently being tested with select partners ahead of wider release later this year.

Key Points:

  • Two new AI models respond to natural language voice commands
  • Fun-CosyVoice3.5 specializes in vocal expression across 13 languages
  • Fun-AudioGen-VD creates complete audio scenes with characters and environments
  • Potential applications span entertainment, education and customer service
  • Represents significant leap in making professional audio tools accessible

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

DeepSeek V4 Arrives: A Multimodal AI Powerhouse

DeepSeek is gearing up to launch its V4 model, a significant upgrade featuring image, video, and text generation capabilities. The new version promises better compatibility with domestic chips and introduces a 'lite' variant with a massive 1 million token context window. With potential parameter counts reaching into the trillions, this release could redefine what's possible in multimodal AI applications.

March 2, 2026
AI innovationmultimodal technologydeep learning
News

Zhihuo AI Launches Innovation Tool to Streamline Business R&D

Beijing Zhihuo Intelligent Technology has introduced 'Zhihuo AI Innovation Master,' a new platform designed to accelerate corporate innovation cycles. The tool leverages natural language processing to transform ideas into actionable solutions while assessing patent viability. Already adopted across 30+ industries, it promises to lower R&D costs and boost efficiency for businesses of all sizes.

March 2, 2026
AI innovationR&D technologybusiness automation
News

AI-Powered Lunar New Year: How Technology Transformed 2026 Celebrations

This past Spring Festival saw technology take center stage in holiday celebrations. Official data reveals mobile traffic surged nearly 19%, fueled by creative AI applications like digital greetings and virtual assistants. Beyond entertainment, smart systems enhanced transportation safety and tourism experiences nationwide.

March 2, 2026
AI innovationSpring Festival techdigital transformation
News

DeepSeek V4 Brings Multimodal AI Power to Content Creation

DeepSeek is set to launch its groundbreaking V4 model next week, marking a significant leap in AI capabilities. This multimodal powerhouse will generate text, images, and videos simultaneously, opening new creative possibilities. With optimizations for domestic chips and partnerships with Huawei and Cambricon, V4 promises to boost China's AI ecosystem while giving creators powerful new tools.

February 28, 2026
AI innovationmultimodal modelscontent creation
News

How College Students Are Redefining Social Media With AI

Nearly 5,000 students from top universities worldwide participated in Soul App's Metaverse Creation Camp, exploring AI-powered social innovations. The competition marks Soul's strategic shift toward collaborative content creation, offering fresh insights into Gen Z's digital social habits while lowering barriers to AI development.

February 27, 2026
AI innovationGen Z techsocial media evolution
Inception Labs shakes up AI with Mercury2 - a diffusion model that thinks like an editor
News

Inception Labs shakes up AI with Mercury2 - a diffusion model that thinks like an editor

AI startup Inception Labs has unveiled Mercury2, a groundbreaking language model that ditches the standard Transformer architecture for diffusion models. Unlike conventional AI that writes word by word, Mercury2 edits entire passages simultaneously - think of it as having an AI assistant that can rewrite paragraphs instead of typing letters. Early tests show it's blisteringly fast, generating over 1,000 tokens per second while maintaining quality. With competitive pricing and specialized features for speed-sensitive applications, this could be the start of a new approach to AI text generation.

February 25, 2026
AI innovationDiffusion modelsNatural language processing