AI D-A-M-N/Qwen-TTS Expands Support to Three Chinese Dialects

Qwen-TTS Expands Support to Three Chinese Dialects

Qwen-TTS Enhances Speech Synthesis with Dialect Support

The Qwen-TTS text-to-speech model has unveiled a significant update, expanding its linguistic capabilities to include three prominent Chinese dialects: Beijing, Shanghai, and Sichuan. This development marks a leap forward in making AI-generated speech more regionally inclusive and culturally resonant.

Technical Advancements

Trained on a massive corpus exceeding 3 million hours, Qwen-TTS achieves human-like naturalness in speech synthesis. The model dynamically adjusts intonation, rhythm, and emotional inflection based on input text, producing remarkably expressive outputs.

Image

Voice Options and Applications

The update introduces seven distinct voice profiles:

  • Standard voices: Cherry (female) and Ethan (male)
  • Dialect-specific voices:
    • Dylan (Beijing dialect)
    • Jada (Shanghai dialect)
    • Sunny (Sichuan dialect)

Early demonstrations showcase the model's versatility. When synthesizing childhood narratives in Beijing dialect, the output captures playful nostalgia. Shanghai dialect synthesis delivers authentic local flavor in daily conversation scenarios.

Future Roadmap

The development team plans to:

  1. Add support for additional languages
  2. Introduce more voice styles
  3. Further refine emotional expressiveness

The Qwen API enables seamless integration for developers, opening possibilities for applications in education, entertainment, and customer service.

Model Studio: https://help.aliyun.com/zh/model-studio/qwen-tts

Key Points

  • Supports three new Chinese dialects: Beijing, Shanghai, Sichuan
  • Trained on 3M+ hours of speech data
  • Offers seven voice profiles with regional authenticity
  • Features dynamic emotional expression in synthesized speech
  • Provides developer-friendly API integration