Meituan's New AI Model Packs a Punch with Smart Parameter Trick

Rethinking How AI Models Grow

Most AI models try to get smarter by adding more "experts" - specialized sub-models that handle different tasks. But Meituan's LongCat team noticed this approach hits diminishing returns fast. Their solution? A clever workaround they call "Embedding Expansion" that makes every parameter work harder.

The Numbers Behind the Magic

At first glance, LongCat-Flash-Lite seems massive - 68.5 billion parameters total. But here's where it gets interesting: during actual use, only 2.9 to 4.5 billion parameters activate at once. That's like having a sports car that only uses the horsepower it needs for each stretch of road.

The secret sauce? Over 30 billion parameters dedicated to an N-gram embedding layer that's exceptionally good at picking up context clues. Need to understand programming commands or technical jargon? This model nails it with surgical precision.

Engineering for Real-World Speed

All this clever architecture wouldn't mean much if the model crawled along. Meituan's engineers made sure that didn't happen with three key optimizations:

Smart Parameter Management: Nearly half the model's brainpower sits in its embedding layer, which works more like a quick dictionary lookup than heavy computation.
Custom Hardware Tricks: They built specialized caching (think of it as short-term memory) and fused operations together to cut down on processing delays.
Predictive Processing: The model guesses what might come next to work more efficiently, like a chess player thinking several moves ahead.

The payoff? Blazing speeds of 500-700 tokens per second (that's several paragraphs generated in the blink of an eye) and the ability to handle documents up to 256,000 words long - perfect for analyzing lengthy reports or codebases.

Benchmark Buster

When put through its paces, LongCat-Flash-Lite surprised even its creators:

Specialized Tasks: Outperformed rivals in telecom, retail, and aviation scenarios on industry-standard tests.
Coding Prowess: Solved over half the problems in SWE-Bench (a tough coding challenge) and crushed terminal command tests with a score nearly double some competitors'.
General Smarts: Held its own against Google's Gemini2.5Flash-Lite on broad knowledge tests and tackled advanced math problems with ease.

The best part? Meituan has open-sourced everything - the model itself, detailed technical papers, even their custom inference engine. Developers can try it out today with a generous daily free allowance of 50 million tokens.

Key Points:

Meituan challenges conventional AI scaling with innovative "Embedding Expansion"
Model activates only 4.5B of its 68.5B parameters per task for efficiency
Excels at technical domains like programming while maintaining broad competence
Open-source release includes weights, research papers, and optimized inference tools

Meituan's New AI Model Packs a Punch with Smart Parameter Trick

Rethinking How AI Models Grow

The Numbers Behind the Magic

Engineering for Real-World Speed

Benchmark Buster

Key Points:

Enjoyed this article?

Related Articles

AI Industry Buzz: Claude's Big Leap, Qwen's Red Envelope Rush & Tencent's Manga Move

Zhipu's GLM-4.7-Flash Hits 1 Million Downloads in Just Two Weeks

China Telecom Spearheads AI Revolution Across Industries

Kunlun Tech Brings AI Power Directly to Your Desktop with TianGong Skywork

AI's Reality Check: Top Models Flunk Expert Exam

Carnegie Mellon's AI Conductors Fix 3D Printing Flaws Mid-Creation

Popular Articles

TSMC Reports Record Revenue, AI Growth Fuels Optimism for 2025

Nano Banana 2 Redefines AI Art with Pinpoint Precision

DeepSeek V3 Surpasses Claude 3.5 in AI Performance Tests

ASUS Unveils NUC AI Mini PC Featuring Color E Ink Display

Claude AI Assistant Launches on Slack to Boost Team Productivity

Main Pages

Content

Others