Ant Group Introduces Cost-Efficient MoE Language Models
Ant Group's Ling team has made a groundbreaking advancement in the field of artificial intelligence with the introduction of two new Mixture-of-Experts (MoE) large language models: Ling-Lite and Ling-Plus. These models, detailed in a technical paper published on the preprint platform Arxiv, are designed to significantly reduce training costs while maintaining high performance, even on low-performance hardware.
The Models: Ling-Lite and Ling-Plus
Ling-Lite, with 16.8 billion parameters (including 2.75 billion activation parameters), and its enhanced counterpart, Ling-Plus, featuring a staggering 290 billion parameters (with 28.8 billion activation parameters), represent a leap forward in AI efficiency. Notably, the 300 billion parameter MoE model within Ling-Plus achieves performance comparable to models trained on high-end Nvidia GPUs, despite being trained on domestically produced, lower-spec hardware.
Image Source Note: Image generated by AI, image licensing provided by Midjourney
Breaking Resource Barriers
Traditionally, training MoE models requires expensive high-performance GPUs like Nvidia's H100 and H800. This not only drives up costs but also limits accessibility due to chip shortages. To address these challenges, Ant Group's Ling team set an ambitious goal: scaling models without relying on high-end GPUs. Their innovative approach includes:
- Dynamic parameter allocation: Optimizing resource usage during training.
- Mixed-precision scheduling: Reducing computational overhead.
- Upgraded training exception handling: Cutting interruption response time and compressing the verification cycle by over 50%.
Cost Efficiency and Performance
In experiments, the team pre-trained Ling-Plus on 9 trillion tokens. Training on 1 trillion tokens using high-performance hardware typically costs approximately 6.35 million RMB. However, Ant Group's optimized methods reduced this cost to around 5.08 million RMB, achieving nearly 20% savings. Performance-wise, the models rival established systems like Alibaba's Tongyi Qwen2.5-72B-Instruct and DeepSeek-V2.5-1210-Chat.
Implications for AI Development
The success of these models could revolutionize the AI industry by providing a more cost-effective solution for developing large language models. By reducing reliance on Nvidia chips and enabling efficient training on lower-spec hardware, Ant Group is paving the way for broader adoption of advanced AI technologies in resource-constrained environments.
Key Points
- Ant Group introduced two MoE large language models: Ling-Lite (16.8B parameters) and Ling-Plus (290B parameters).
- These models achieve high performance using low-performance hardware, reducing training costs by nearly 20%.
- Innovations include dynamic parameter allocation, mixed-precision scheduling, and improved exception handling.
- The technology reduces reliance on Nvidia GPUs, offering a cost-effective alternative for AI development.
- The models' performance rivals established systems like Alibaba's Tongyi Qwen2.5 and DeepSeek-V2.5.



