Skip to main content

Meituan's LongCat-Flash-Lite: A Lean AI That Packs a Punch

Meituan Rewrites the Rules of Efficient AI

Image

In an industry obsessed with ever-larger models, Meituan's LongCat team has taken a different path. Their newly released LongCat-Flash-Lite proves that smarter architecture can outperform brute-force scaling. "We kept hitting diminishing returns with traditional MoE approaches," explains the team's technical lead. "Then we asked - what if we invested those parameters differently?"

The Embedding Layer Breakthrough

The secret sauce? A technique they call "Embedding Expansion." While most mixture-of-experts models keep adding specialists (think: hiring more consultants), LongCat-Flash-Lite supercharges its vocabulary understanding instead (like giving existing consultants better reference manuals).

Here's why it works:

  • 68.5 billion parameters total, but only 2.9-4.5 billion activate per query
  • Over 30 billion parameters dedicated to N-gram embeddings that grasp technical jargon effortlessly
  • Specialized understanding for domains like programming commands (try confusing it with obscure terminal inputs)

Image

Engineering Magic Behind the Speed

Theoretical efficiency means nothing without real-world performance. Meituan's engineers delivered three clever optimizations:

  1. Parameter Diet Plan: Nearly half the model lives in lightweight embedding lookups (O(1) complexity - computer science speak for "blazing fast")
  2. Memory Tricks: A custom N-gram Cache system plus fused CUDA kernels cut down on computational paperwork
  3. Guessing Game: Speculative decoding lets the model anticipate likely outputs, like a chess player thinking several moves ahead

The payoff? Try 500-700 tokens per second - fast enough to generate Shakespeare's Hamlet in about 90 seconds while handling contexts up to 256K tokens.

Benchmark Dominance Across Fields

The numbers don't lie:

  • Code Whisperer: Scores 54.4% on SWE-Bench (software engineering tasks) and dominates terminal command tests
  • Mathlete: Holds its own against Gemini2.5Flash-Lite on MMLU (85.52) and competition-level math problems
  • Specialist Agent: Tops charts for telecom, retail, and aviation scenarios on τ²-Benchmark

The kicker? Meituan has open-sourced everything - weights, technical papers, even their optimized inference engine. Developers can apply today via the LongCat API Open Platform with a generous 50 million token daily free tier. Because sometimes, the best things in AI don't come in the biggest packages.

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Yuchu's New AI Model Gives Robots Common Sense
News

Yuchu's New AI Model Gives Robots Common Sense

Chinese tech firm Yuchu has open-sourced UnifoLM-VLA-0, a breakthrough AI model that helps humanoid robots understand physical interactions like humans do. Unlike typical AI that just processes text and images, this model grasps spatial relationships and real-world dynamics - enabling robots to handle complex tasks from picking up objects to resisting disturbances. Built on existing technology but trained with just 340 hours of robot data, it's already outperforming competitors in spatial reasoning tests.

January 30, 2026
AI roboticsopen-source AIhumanoid robots
DeepSeek's Memory Boost: How AI Models Are Getting Smarter
News

DeepSeek's Memory Boost: How AI Models Are Getting Smarter

DeepSeek researchers have developed a clever solution to make large language models more efficient. Their new Engram module acts like a mental shortcut book, helping AI quickly recall common phrases while saving brainpower for tougher tasks. Early tests show impressive gains - models using Engram outperformed standard versions in reasoning, math, and coding challenges while handling longer texts with ease.

January 15, 2026
AI efficiencylanguage modelsmachine learning
News

Zhipu and Huawei Team Up to Launch Open-Source Image Model on Domestic Chips

Zhipu AI and Huawei have unveiled GLM-Image, a groundbreaking multimodal model that runs entirely on China's Ascend chips. This marks a significant step in domestic AI development, combining cutting-edge image generation with complete independence from foreign hardware. The hybrid architecture blends language modeling with diffusion techniques, promising more intelligent content creation tools for Chinese developers.

January 14, 2026
AI independenceChinese techmultimodal models
Yuan3.0Flash: A Game-Changing Open-Source AI Model
News

Yuan3.0Flash: A Game-Changing Open-Source AI Model

The YuanLab.ai team has unveiled Yuan3.0Flash, a revolutionary open-source multimodal AI model that's shaking up the industry. With its innovative sparse mixture-of-experts architecture, this 40B-parameter powerhouse delivers GPT-5.1-beating performance while using significantly less computing power. What makes it special? Detailed technical reports and multiple weight versions invite developers to build upon its foundation.

December 31, 2025
AI innovationmultimodal modelsopen-source AI
Open-Source Browser Automation Tool Delivers 200 Tasks Per Dollar
News

Open-Source Browser Automation Tool Delivers 200 Tasks Per Dollar

BrowserUse's new BU-30B-A3B-Preview model is revolutionizing web automation with its cost-effective performance. This open-source solution combines human-like browsing capabilities with remarkable efficiency, processing tasks at lightning speed while keeping costs remarkably low. Developers can now access advanced browser automation without breaking the bank.

December 26, 2025
browser automationopen-source AIweb development tools
MIT's Smart Hack Makes AI Models Work Smarter, Not Harder
News

MIT's Smart Hack Makes AI Models Work Smarter, Not Harder

MIT researchers have cracked the code on making large language models more efficient. Their new 'instance-adaptive scaling' method dynamically adjusts computing power based on question complexity - saving energy while maintaining accuracy. Think of it like giving AI the ability to choose between sprinting and marathon pacing depending on the task.

December 9, 2025
AI efficiencyMIT researchadaptive computing