Skip to main content

Ant Group and Renmin University Unveil First Native MoE Diffusion Language Model

Ant Group and Renmin University Unveil Groundbreaking LLaDA-MoE Model

At the 2025 Inclusion·Bund Conference, Ant Group and Renmin University jointly introduced LLaDA-MoE, the industry's first native Mixture of Experts (MoE) architecture diffusion language model (dLLM). This breakthrough challenges the conventional belief that language models must be autoregressive.

Key Innovations

The LLaDA-MoE model was trained from scratch on approximately 20 terabytes of data, demonstrating remarkable scalability and stability in industrial-scale training. It outperforms previous dense diffusion language models like LLaDA1.0/1.5 and Dream-7B, while matching the performance of equivalent autoregressive models like Qwen2.5-3B-Instruct. Notably, it achieves this by activating only 1.4 billion parameters out of a total 7 billion.

Image

Caption: Renmin University and Ant Group jointly launched the first MoE architecture diffusion model LLaDA-MoE.

Performance Highlights

Under Ant's unified evaluation framework, LLaDA-MoE showed an average improvement of 8.4% across 17 benchmarks, including HumanEval, MBPP, and GSM8K. It leads LLaDA-1.5 by 13.2% and ties with Qwen2.5-3B-Instruct, validating the "MoE amplifier" effect in the dLLM field.

Image

Caption: Performance metrics of LLaDA-MoE compared to other models.

Technical Breakthroughs

Lan Zhenzhong, Director of Ant Group's General AI Research Center, emphasized that this model represents a significant step toward scaling dLLMs to larger sizes. The team rewrote training code based on LLaDA-1.0 and utilized Ant's distributed framework ATorch for parallel acceleration.

Assistant Professor Li Chongxuan from Renmin University highlighted that traditional autoregressive models struggle with bidirectional token dependencies, a limitation addressed by LLaDA-MoE's parallel decoding approach.

Open-Source Commitment

Ant Group plans to open-source not only the model weights but also a custom inference engine optimized for dLLM parallelism, which reportedly outperforms NVIDIA's fast-dLLM solution. Technical reports and code will be released on GitHub and Hugging Face.

Key Points:

  • First native MoE architecture diffusion language model (dLLM)
  • Trained on 20T data with 7B total parameters (1.4B activated)
  • Outperforms dense diffusion models; matches autoregressive counterparts
  • Open-sourcing model weights and inference framework soon

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

India's Alpie AI Model Makes Waves - But Is It Truly Homegrown?
News

India's Alpie AI Model Makes Waves - But Is It Truly Homegrown?

A new AI contender from India called Alpie is turning heads with performance that rivals giants like GPT-4o and Claude3.5 in math and coding tests. However, technical analysis reveals it's actually built on a Chinese open-source model, raising questions about innovation versus optimization. What makes Alpie special is its ability to run efficiently on consumer hardware, potentially democratizing AI access for smaller developers.

January 15, 2026
AIMachine LearningIndia Tech
Mugen3D Turns Single Photos Into Stunning 3D Worlds
News

Mugen3D Turns Single Photos Into Stunning 3D Worlds

A groundbreaking AI tool called Mugen3D is transforming how we create 3D content. Using advanced 3D Gaussian Splatting technology, it can generate remarkably realistic models from just one image - capturing textures, lighting, and materials with astonishing accuracy. This innovation promises to democratize 3D creation across industries from gaming to e-commerce.

January 12, 2026
AIComputerGraphicsDigitalCreation
News

Qualcomm and Google Join Forces to Revolutionize Car Tech with AI

Qualcomm and Google are teaming up to tackle one of the automotive industry's biggest headaches: fragmented in-car systems. Their new 'Automotive AI Agent' combines Qualcomm's Snapdragon Digital Chassis with Google's Android Automotive OS, promising smoother development and smarter features like facial recognition. The partnership also introduces cloud-based development tools that could cut R&D time significantly. This collaboration marks a major step toward more unified, intelligent vehicle systems.

January 9, 2026
automotive-techAIsmart-cars
ChatGPT Steps Into Healthcare as Ant Group's AI Doctor Hits 30 Million Users
News

ChatGPT Steps Into Healthcare as Ant Group's AI Doctor Hits 30 Million Users

OpenAI has entered the AI healthcare race with ChatGPT Health, launching features similar to Ant Group's popular Afu app. The Chinese platform now serves 30 million monthly users, doubling its base in just one month. Both tools offer health Q&A and smart device integration, though Afu maintains an edge with full medical service connections.

January 8, 2026
AIHealthcareDigitalWellnessChatGPT
News

Bosch Bets Big on AI with €2.5 Billion Push Into Smart Cars

At CES 2026, automotive giant Bosch unveiled plans to invest over €2.5 billion in AI development by 2027, targeting smarter cockpits and safer autonomous driving systems. The German supplier aims to transform from hardware specialist to software leader, projecting its tech division could hit €10 billion in sales by the mid-2030s.

January 7, 2026
BoschAIautonomous vehicles
MiniMax IPO Fever: Hong Kong Investors Flock to China's AI Pioneer
News

MiniMax IPO Fever: Hong Kong Investors Flock to China's AI Pioneer

MiniMax, China's rising star in AI technology, has concluded its Hong Kong IPO with staggering investor enthusiasm. The offering saw subscriptions oversubscribed by 1,209 times, raising over HK$253 billion. Backed by heavyweight investors like Alibaba and Abu Dhabi Investment Authority, MiniMax is set to become one of the fastest-growing AI companies ever to go public when it lists on January 9.

January 6, 2026
AIIPOHongKongMarkets