Skip to main content

Meituan's New AI Model Packs Big Performance in Small Package

Meituan's Compact AI Model Delivers Outsized Performance

Image

In the world of AI models, bigger hasn't always meant better. Traditional Mixture of Experts (MoE) architectures often hit diminishing returns as they scale up expert counts. Meituan's LongCat team flipped this script with their new LongCat-Flash-Lite model, achieving remarkable results through an innovative approach they call "Embedding Expansion."

Rethinking How Models Scale

The breakthrough came when researchers discovered something counterintuitive: expanding embedding layers could outperform simply adding more experts. The numbers tell the story - while the full model contains 68.5 billion parameters, each inference activates just 2.9 to 4.5 billion parameters thanks to clever N-gram embedding layers.

"We've allocated over 30 billion parameters specifically to embedding," explains the technical report. "This lets us capture local semantics precisely - crucial for recognizing specialized contexts like programming commands."

Image

Engineering Efficiency at Every Level

Theoretical advantages don't always translate to real-world performance. Meituan addressed this through three key optimizations:

  1. Smart Parameter Use: Nearly half (46%) of parameters go to embedding layers, keeping computational growth manageable.
  2. Custom Hardware Tricks: Specialized caching (similar to KV Cache) and fused CUDA kernels slash I/O delays.
  3. Predictive Processing: A three-step speculative decoding approach expands batch sizes efficiently.

The result? Blazing speeds of 500-700 tokens per second handling substantial inputs (4K tokens) with outputs up to 1K tokens - all supporting contexts as long as 256K tokens.

Benchmark-Busting Performance

The proof comes in testing where LongCat-Flash-Lite punches above its weight:

  • Excels at practical applications like telecom support and retail scenarios on τ²-Bench
  • Shows particular strength in coding (54.4% on SWE-Bench) and command execution (33.75 on TerminalBench)
  • Holds its own generally (85.52 MMLU score) against larger models like Gemini2.5Flash-Lite

The entire package - weights, technical documentation, and SGLang-FluentLLM inference engine - is now open source through Meituan's LongCat API Open Platform, offering developers generous daily testing allowances.

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

China Unveils Pioneering AI Model to Predict South China Sea Weather Patterns
News

China Unveils Pioneering AI Model to Predict South China Sea Weather Patterns

Chinese scientists have developed Feiyu-1.0, the world's first bidirectional coupled intelligent model for the South China Sea region. This groundbreaking technology can analyze complex ocean-atmosphere interactions in real-time, significantly improving typhoon forecasting accuracy. Beyond weather prediction, the model generates dynamic ocean knowledge graphs, transforming scientific data into accessible visual information for maritime safety and environmental protection.

February 9, 2026
marine meteorologyAI innovationclimate technology
AI Showdown: Claude's Big Leap, Qwen's Red Envelope Rush & Tencent's Manga Move
News

AI Showdown: Claude's Big Leap, Qwen's Red Envelope Rush & Tencent's Manga Move

Today's AI landscape sees major players making bold moves. Anthropic pushes boundaries with Claude Opus 4.6's massive context window, while Alibaba Qwen battles server crashes amid its wildly popular Spring Festival promotion. Meanwhile, Tencent enters the animated manga arena with Huolong Webtoon, and regulators crack down on AI copycats. From digital employees to automated anime production, these developments showcase AI's rapid evolution across industries.

February 6, 2026
AI innovationtech regulationdigital transformation
Zhipu's GLM-4.7-Flash Hits 1 Million Downloads in Just Two Weeks
News

Zhipu's GLM-4.7-Flash Hits 1 Million Downloads in Just Two Weeks

Zhipu AI's lightweight model GLM-4.7-Flash has taken the open-source community by storm, surpassing 1 million downloads on Hugging Face within 14 days of release. This hybrid thinking model outperforms competitors in benchmark tests, offering developers an efficient and cost-effective solution for AI applications. Its rapid adoption signals strong market validation for Zhipu's approach to balancing performance with practical deployment considerations.

February 4, 2026
AI developmentOpen sourceMachine learning
News

China Telecom Spearheads AI Revolution Across Industries

China Telecom is leading the charge in implementing AI across diverse sectors, from urban management to industrial production. Partnering with other telecom giants, they've launched a massive computing project to fuel AI development. Government officials highlight how these efforts boost efficiency while driving economic growth through technological innovation.

February 4, 2026
AI innovationdigital transformationChina Telecom
News

Kunlun Tech Brings AI Power Directly to Your Desktop with TianGong Skywork

Kunlun Tech has unveiled its groundbreaking TianGong Skywork Desktop Edition, putting powerful AI capabilities right on your computer. Unlike cloud-dependent alternatives, this innovative software processes everything locally - keeping your data secure while delivering lightning-fast performance. With support for multiple top-tier AI models and hundreds of built-in skills, it's transforming Windows PCs into intelligent digital collaborators.

February 4, 2026
AI innovationdesktop computingdata privacy
News

AI's Reality Check: Top Models Flunk Expert Exam

In a humbling revelation, leading AI models including GPT-4o scored dismally on a rigorous new test designed by global experts. The 'Ultimate Human Exam' exposed critical limitations in AI reasoning, with top performers barely scraping 8% accuracy. These results challenge our assumptions about artificial intelligence's true capabilities and raise questions about whether current benchmarks measure real understanding or just sophisticated pattern matching.

February 3, 2026
AI testingMachine learningArtificial intelligence