Ant Group's Bai Ling Open-Sources AI Models to Slash Inference Costs
Ant Group's Bai Ling Team Open-Sources High-Efficiency AI Models
Ant Group's Bai Ling Large Model team has unveiled two groundbreaking open-source models designed to revolutionize deep reasoning efficiency: Ring-flash-linear-2.0 and Ring-mini-linear-2.0. Accompanying these releases are two proprietary high-performance fusion operators – the FP8 fusion operator and the linear Attention inference fusion operator.
Technical Breakthroughs
The new models employ a "large parameters, low activation" architecture optimized for complex reasoning tasks. Through architectural refinements and specialized operator integration, the team reports:
- 90% cost reduction versus comparable dense models
- Over 50% lower inference costs than previous Ring-series versions
- Enhanced stability during reinforcement learning phases
The models demonstrate state-of-the-art (SOTA) performance across multiple challenging reasoning benchmarks while maintaining exceptional computational efficiency.
Deployment Advantages
Key operational benefits include:
- Tight alignment between training and inference engines enables consistent optimization
- Support for ultra-long context processing expands application potential
- Reduced hardware requirements lower barriers to implementation
The team emphasizes these advancements will particularly benefit organizations running intensive reasoning workloads where computational costs represent significant operational expenses.
Availability
The complete package is now accessible through major AI platforms including:
- Hugging Face
- ModelScope
developers can immediately begin experimenting with these tools for their projects.
Strategic Impact
This release underscores Ant Group's growing influence in AI infrastructure development while providing the broader community with production-grade tools previously limited to internal use.
Key Points:
- Two new open-source reasoning models released (Ring-flash-linear-2.0/Ring-mini-linear-2.0)
- Includes novel FP8 and linear Attention fusion operators
- Achieves 90% cost reduction versus comparable dense models
- Maintains SOTA performance across benchmarks
- Available on Hugging Face and ModelScope platforms


