Huawei Unveils UCM Tech to Reduce HBM Reliance in AI
Huawei's UCM Technology Aims to Revolutionize AI Inference
On August 12, 2025, Huawei unveiled its groundbreaking UCM (Inference Memory Data Manager) technology at the 2025 Financial AI Inference Application Implementation and Development Forum. This innovation is set to reduce China's reliance on High Bandwidth Memory (HBM) for AI inference while significantly boosting the performance of large-scale AI models.
How UCM Works
The UCM technology focuses on KV Cache, integrating multiple cache acceleration algorithms. By hierarchically managing memory data generated during inference, it expands the context window, delivering high throughput and low latency while reducing the cost per Token. This approach mitigates common issues like task stagnation and response delays caused by insufficient HBM resources.

Industry Collaboration and Expert Insights
At the forum, Huawei partnered with China UnionPay to showcase the latest advancements in AI inference applications. Experts from institutions such as the China Academy of Information and Communications Technology, Tsinghua University, and iFlytek also shared their experiences in optimizing large model inference.
Fan Jie, Vice President of Huawei's Data Storage Product Line, emphasized that future AI breakthroughs will heavily depend on high-quality industry data. "High-performance AI storage can reduce data loading time from hours to minutes," he noted, "and improve computing cluster efficiency from 30% to 60%."
Market Implications
The launch of UCM arrives as the AI industry shifts focus from "pursuing model capability limits" to "optimizing inference experiences." Analysts highlight that inference performance is now a key metric for assessing AI's commercial value. According to Great Wall Securities, advancements in large models and expanding commercial applications present new opportunities for companies in the computing power sector.
Key Points:
- UCM technology reduces HBM dependency for AI inference.
- Enhances performance with high throughput and low latency.
- Industry leaders collaborate to advance AI applications.
- Future AI progress hinges on data quality and storage efficiency.
- Market trends favor optimization over raw model capabilities.


