Zhipu AI Unveils Smarter Voice Typing with Open-Sourced Speech Tech
Zhipu AI Raises the Bar for Voice Recognition
Chinese AI firm Zhipu has just dropped a major upgrade that could change how we interact with our computers. Their new GLM-ASR speech recognition models aren't just smarter - they're being shared with the world through open-source licensing.

The star of the show is the cloud-based GLM-ASR-2512, which boasts industry-leading accuracy with a character error rate below 0.072%. That means it gets words right more than 99.9% of the time, even when dealing with different accents or noisy environments.
"We wanted to create something that works as well in a busy café as it does in a quiet office," explains Zhipu's technical lead. The model handles multiple languages seamlessly, making it ideal for global users.
Power in a Small Package
For those concerned about privacy or needing offline access, Zhipu offers GLM-ASR-Nano-2512 - a compact version packing surprising punch despite its modest 1.5 billion parameters. Tests show it outperforms some proprietary systems while running directly on your device.
This local processing means your voice data stays private rather than being sent to distant servers. It also cuts down on lag - your words appear almost instantly as you speak them.
Your Computer Just Got More Conversational
The technology powers Zhipu's refreshed AI Input Method, transforming PCs into responsive voice assistants. Beyond simple dictation, it can translate spoken words between languages or rephrase text on command - think of it like having a secretary living in your keyboard.
Early adopters get 2,000 free points (about four weeks of typical use) to explore features including:
- Real-time speech-to-text conversion
- Multi-language translation
- Smart text rewriting
- Cross-platform synchronization
The desktop app currently supports Windows and macOS, with mobile versions reportedly in development.
Why This Matters
By open-sourcing their technology, Zhipu invites developers worldwide to build upon their work rather than keeping innovations locked away. This approach could accelerate progress across everything from accessibility tools to smart home devices.
The new input method also hints at where computing interfaces might be heading - toward systems that understand natural speech as effortlessly as they process mouse clicks.
Key Points:
- 🎙️ Two new speech models: cloud-based powerhouse + privacy-focused local version
- 💻 Revamped input method adds translation and text editing by voice
- 🆓 Generous free trial lets users test premium features
- 🔓 Open-source approach encourages wider innovation