Skip to main content

Meituan's New AI Model Packs Big Performance in Small Package

Meituan's Compact AI Model Delivers Outsized Performance

Image

In the world of AI models, bigger hasn't always meant better. Traditional Mixture of Experts (MoE) architectures often hit diminishing returns as they scale up expert counts. Meituan's LongCat team flipped this script with their new LongCat-Flash-Lite model, achieving remarkable results through an innovative approach they call "Embedding Expansion."

Rethinking How Models Scale

The breakthrough came when researchers discovered something counterintuitive: expanding embedding layers could outperform simply adding more experts. The numbers tell the story - while the full model contains 68.5 billion parameters, each inference activates just 2.9 to 4.5 billion parameters thanks to clever N-gram embedding layers.

"We've allocated over 30 billion parameters specifically to embedding," explains the technical report. "This lets us capture local semantics precisely - crucial for recognizing specialized contexts like programming commands."

Image

Engineering Efficiency at Every Level

Theoretical advantages don't always translate to real-world performance. Meituan addressed this through three key optimizations:

  1. Smart Parameter Use: Nearly half (46%) of parameters go to embedding layers, keeping computational growth manageable.
  2. Custom Hardware Tricks: Specialized caching (similar to KV Cache) and fused CUDA kernels slash I/O delays.
  3. Predictive Processing: A three-step speculative decoding approach expands batch sizes efficiently.

The result? Blazing speeds of 500-700 tokens per second handling substantial inputs (4K tokens) with outputs up to 1K tokens - all supporting contexts as long as 256K tokens.

Benchmark-Busting Performance

The proof comes in testing where LongCat-Flash-Lite punches above its weight:

  • Excels at practical applications like telecom support and retail scenarios on τ²-Bench
  • Shows particular strength in coding (54.4% on SWE-Bench) and command execution (33.75 on TerminalBench)
  • Holds its own generally (85.52 MMLU score) against larger models like Gemini2.5Flash-Lite

The entire package - weights, technical documentation, and SGLang-FluentLLM inference engine - is now open source through Meituan's LongCat API Open Platform, offering developers generous daily testing allowances.

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Alibaba's Tiny AI Model Packs a Punch with Smart Upcycling Technique
News

Alibaba's Tiny AI Model Packs a Punch with Smart Upcycling Technique

Alibaba's research team has achieved something remarkable - transforming a modest 0.6 billion parameter AI model into a powerful 17.3 billion parameter system that runs efficiently on standard CPUs. The secret? An innovative 'upcycling' approach that activates just 5% of parameters during operation. This breakthrough could make sophisticated AI more accessible than ever, performing tasks at 30 tokens per second without expensive hardware. It's not just about size - the clever training methods make this compact model outperform larger rivals.

April 10, 2026
AI efficiencyMachine learningMoE architecture
Claude's New Advisor Tool: Smart AI Help Without the Hefty Price Tag
News

Claude's New Advisor Tool: Smart AI Help Without the Hefty Price Tag

Anthropic has introduced a clever new feature for its Claude AI platform that combines efficiency with intelligence. The Advisor Tool lets faster, more affordable models handle routine tasks while automatically consulting the more powerful Claude Opus for tough decisions. Think of it like having a quick junior assistant who can discreetly tap a senior expert when needed. Early tests show significant performance boosts with surprising cost savings - in some cases doubling capabilities while keeping expenses low.

April 10, 2026
AI innovationClaude AIcost optimization
Xiaomi's OmniVoice: A Game-Changer in Multilingual Speech Synthesis
News

Xiaomi's OmniVoice: A Game-Changer in Multilingual Speech Synthesis

Xiaomi's next-generation Kaldi team has open-sourced OmniVoice, a groundbreaking multilingual text-to-speech model supporting over 600 languages. With Chinese word error rates as low as 0.84% and processing speeds 40 times faster than real-time, this innovation sets new standards in speech synthesis. What makes it truly remarkable? It can clone voices from just 3-10 seconds of audio and even help preserve endangered languages.

April 9, 2026
speech synthesisAI innovationmultilingual technology
News

Hollywood Star Milla Jovovich Stuns Tech World with Open-Source AI Memory Breakthrough

Milla Jovovich, best known for her action-packed 'Resident Evil' role, has turned tech innovator by open-sourcing MemPalace - an AI memory system that just aced industry benchmarks. Drawing from ancient Greek memory techniques, this local-first solution outperforms commercial products while keeping data private. The GitHub release has developers buzzing about its intuitive 'mental palace' architecture and impressive compression technology.

April 7, 2026
AI innovationopen source technologydigital privacy
Milla Jovovich's AI Memory Breakthrough Stuns Tech World
News

Milla Jovovich's AI Memory Breakthrough Stuns Tech World

Hollywood star Milla Jovovich has ventured into AI development, leading a team that created MemPalace - an innovative memory system inspired by ancient Greek techniques. The open-source project, which organizes AI conversations into a navigable 3D space, achieved perfect scores in industry benchmarks while prioritizing user privacy through local operation. This unexpected success from a non-technical celebrity challenges assumptions about who can drive AI innovation.

April 7, 2026
AI innovationMemory systemsOpen source
News

Meituan's New AI Model Sees and Hears Like Humans Do

Meituan has unveiled LongCat-Next, a groundbreaking AI model that processes images, speech, and text with equal fluency. Unlike traditional systems that treat these formats separately, this technology converts all inputs into a common language the AI understands natively. Early tests show impressive results in reading documents, solving visual math problems, and even mimicking human voices - all while maintaining top-tier text comprehension skills.

April 3, 2026
AI innovationmultimodal learningcomputer vision