Tencent ARC Open-Sources AudioStory for Long-Form Audio
Tencent ARC Unveils Open-Source AudioStory Model for Long-Form Audio Generation
Tencent's Applied Research Center (ARC) has publicly released AudioStory, an innovative model designed to generate long-form narrative audio using large language models (LLMs). The open-source project marks a significant advancement in text-to-audio technology, particularly for extended content where temporal coherence and structural complexity present challenges.

Technical Framework and Capabilities
The model operates through a unified understanding and generation framework, enabling diverse applications including:
- Video dubbing
- Audio continuation
- Long narrative synthesis
By integrating LLMs with audio generation systems, AudioStory maintains scene transition continuity and emotional tone consistency across extended timelines. Its instruction-following architecture decomposes complex narrative queries into chronologically ordered subtasks.

Key Innovations
AudioStory introduces two breakthrough features:
- Decoupled bridging mechanism: Separates LLM collaboration from audio generation into specialized components
- End-to-end training: Unifies instruction interpretation with audio production for enhanced system synergy
The team has concurrently released the AudioStory-10K benchmark dataset, spanning domains from animated soundscapes to natural sound narratives. Comparative testing demonstrates superior performance against conventional text-to-audio models in both single-instance generation and extended narrative contexts.
Practical Applications
Current implementations include:
- Dubbing for classic animations (demonstrated with Tom and Jerry samples)
- Text-based long audio generation
- Multi-scene narrative construction The project's GitHub repository contains inference code alongside extensive documentation of use cases.
Key Points:
🎧 Combines LLMs with audio generation for coherent long-form narratives
📊 Outperforms existing models in temporal coherence and instruction fidelity
🛠️ Open-sourced with 10K benchmark dataset for community development
🌐 Demonstrated applications in entertainment and media production

