Tongyi Qianwen's New AI Model Lets You Edit Photos Like Never Before
A New Era for Photo Editing: Qwen-Image-Layered Unveiled
Imagine being able to peel a photograph like an onion, separating each element into its own editable layer with perfect clarity. That's exactly what Tongyi Qianwen's new Qwen-Image-Layered model delivers, solving two of digital editing's most stubborn headaches.

The End of Editing Frustrations
Traditional AI editing tools often create more problems than they solve. Want to change a shirt color? The whole image might shift tones. Trying to remove an object? Blurry edges and awkward artifacts frequently ruin the result. Qwen-Image-Layered approaches these challenges differently - by fundamentally rethinking how we deconstruct images.
"This isn't just another filter or masking tool," explains the development team. "We're giving images actual structure - breaking them into semantic layers that maintain their independence while preserving the whole composition."
How the Magic Works
The secret sauce lies in two key innovations:
- RGBA-VAE Technology: Allows seamless communication between RGB images and transparent RGBA layers in the same space, eliminating uneven distribution issues that plague other systems.
- VLD-MMDiT Architecture: Handles anywhere from 3 to 10+ layers simultaneously, with attention mechanisms coordinating between them - no more tedious layer-by-layer processing.
What does this mean in practice? Users can now:
- Recolor elements without affecting surrounding areas
- Swap objects while maintaining realistic lighting and shadows
- Edit text within existing images naturally
- Scale or move components without distortion artifacts
The system even supports recursive decomposition - any layer can be broken down further for microscopic adjustments.
From Labs to Your Laptop
The team has made everything publicly available:
- Technical Report: arxiv.org/abs/2512.15603
- Code & Models: Github | ModelScope | Hugging Face
- Live Demo: ModelScope Studio
"We see this as more than a tool," shares the Qwen team. "It's a new language for interacting with visual content - one where every element becomes as editable as text in a document."
Key Points:
- Layer Revolution: Images decompose into clean RGBA layers like onion skins
- Precision Editing: Modify colors, objects or text without affecting other elements
- Flexible Architecture: Handles 3 to 10+ layers simultaneously with attention coordination
- Recursive Power: Any layer can be broken down further for microscopic adjustments
- Open Access: Full technical details and implementations available across major platforms