Skip to main content

Meituan's New AI Model Packs a Punch with Smart Parameter Tricks

Meituan Rewrites the Rules for Efficient AI Models

Image

In the world of AI, bigger hasn't always meant better. While most teams keep stacking more 'experts' into their models, Meituan's LongCat crew took a different path. Their newly launched LongCat-Flash-Lite proves that smarter parameter use can outperform brute-force scaling.

The Embedding Layer Breakthrough

Traditional MoE (Mixture of Experts) architectures hit diminishing returns as they grow. But here's where Meituan's approach stands out: by strategically beefing up the embedding layer rather than just adding experts, they've created a model that activates only 2.9 to 4.5 billion parameters per task - despite packing 68.5 billion total parameters.

The secret sauce? An N-gram embedding system that captures local patterns with surgical precision. Need to understand programming commands or technical jargon? The model spots these patterns like a seasoned coder recognizing familiar syntax.

Image

Engineering Magic Behind the Scenes

Turning theoretical advantages into real-world speed required three clever optimizations:

  • Smart Parameter Budgeting: Nearly half the model's capacity lives in its embedding layer, using efficient O(1) lookups instead of costly computations.
  • Custom Hardware Tricks: The team built specialized caching (think of it as a supercharged N-gram memory) and fused key operations together to slash processing delays.
  • Prediction Teamwork: By combining speculative decoding with their unique architecture, they achieve blistering speeds of 500-700 tokens per second while handling massive 256K context windows.

Performance That Turns Heads

The numbers tell an impressive story:

  • Coding Prowess: Scored 54.4% on SWE-Bench and dominated terminal command tests (33.75 on TerminalBench)
  • Agent Excellence: Topped industry-specific benchmarks for telecom, retail and aviation scenarios
  • General Smarts: Matches Gemini2.5Flash-Lite on MMLU (85.52) while holding its own in advanced math

The best part? Meituan is putting everything out there - model weights, technical deep dives, even their custom inference engine (SGLang-FluentLLM). Developers can grab 50 million free tokens daily through the LongCat API platform to test drive this innovative approach.

Key Points:

  • Breaks from traditional MoE scaling by optimizing embedding layers instead of just adding experts
  • Achieves large-model performance while activating only 4.5B parameters per task
  • Specialized caching and kernel fusion deliver exceptional speed (500+ tokens/sec)
  • Open-source release includes weights, technical reports and inference engine

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Hongdou's Smart Clothing Revolution: Wearable Robots That Boost Your Moves

Hongdou, China's renowned apparel brand, has teamed up with tech giant China Electronic Science and Technology Group to launch groundbreaking AI-powered wearable robots. These innovative garments use advanced sensors to assist movement during activities like hiking or shopping. The collaboration marks Hongdou's ambitious leap from traditional clothing to smart wearables that could redefine physical limits.

February 6, 2026
wearable technologyAI innovationsmart clothing
Zhipu's GLM-4.7-Flash Hits 1 Million Downloads in Just Two Weeks
News

Zhipu's GLM-4.7-Flash Hits 1 Million Downloads in Just Two Weeks

Zhipu AI's lightweight model GLM-4.7-Flash has taken the open-source community by storm, surpassing 1 million downloads on Hugging Face within 14 days of release. This hybrid thinking model outperforms competitors in benchmark tests, offering developers an efficient and cost-effective solution for AI applications. Its rapid adoption signals strong market validation for Zhipu's approach to balancing performance with practical deployment considerations.

February 4, 2026
AI developmentOpen sourceMachine learning
News

China Telecom Spearheads AI Revolution Across Industries

China Telecom is leading the charge in implementing AI across diverse sectors, from urban management to industrial production. Partnering with other telecom giants, they've launched a massive computing project to fuel AI development. Government officials highlight how these efforts boost efficiency while driving economic growth through technological innovation.

February 4, 2026
AI innovationdigital transformationChina Telecom
News

Kunlun Tech Brings AI Power Directly to Your Desktop with TianGong Skywork

Kunlun Tech has unveiled its groundbreaking TianGong Skywork Desktop Edition, putting powerful AI capabilities right on your computer. Unlike cloud-dependent alternatives, this innovative software processes everything locally - keeping your data secure while delivering lightning-fast performance. With support for multiple top-tier AI models and hundreds of built-in skills, it's transforming Windows PCs into intelligent digital collaborators.

February 4, 2026
AI innovationdesktop computingdata privacy
News

AI's Reality Check: Top Models Flunk Expert Exam

In a humbling revelation, leading AI models including GPT-4o scored dismally on a rigorous new test designed by global experts. The 'Ultimate Human Exam' exposed critical limitations in AI reasoning, with top performers barely scraping 8% accuracy. These results challenge our assumptions about artificial intelligence's true capabilities and raise questions about whether current benchmarks measure real understanding or just sophisticated pattern matching.

February 3, 2026
AI testingMachine learningArtificial intelligence
News

Carnegie Mellon's AI Conductors Fix 3D Printing Flaws Mid-Creation

Researchers at Carnegie Mellon have created an AI system that spots and fixes 3D printing errors in real time, much like a conductor leading an orchestra. The innovative approach coordinates multiple AI agents to monitor prints, diagnose issues, and adjust settings automatically. Early tests show parts made with this system can handle five times more weight than traditional prints. What makes it special? The system works across different printers without retraining and keeps manufacturers' secrets safe.

February 3, 2026
AI innovation3D printingmanufacturing tech