Skip to main content

Meituan Unveils LongCat-Flash-Chat: A 560B-Parameter AI Model

Meituan Unveils LongCat-Flash-Chat: A Breakthrough in AI Efficiency

Meituan has officially released LongCat-Flash-Chat, a cutting-edge AI large model with 560 billion parameters, setting new standards in computational efficiency and performance. The model, now open-sourced, leverages an innovative Mixture of Experts (MoE) architecture, activating only 18.6B to 31.3B parameters per token through its "zero computation expert" mechanism.

Architectural Innovations

The model introduces a cross-layer channel design, significantly enhancing training and inference parallelism. On H800 hardware, LongCat-Flash achieves an impressive 100 tokens per second for single-user inference after just 30 days of training. A PID controller dynamically adjusts expert biases during training, maintaining an average of 27B activated parameters to optimize computing power usage.

Image

Superior Agent Capabilities

LongCat-Flash stands out in agent performance, thanks to its proprietary Agentic evaluation set and multi-agent data generation strategy. It ranked first in the VitaBench benchmark for complex scenarios and outperforms larger models in tool usage tasks.

Image

Benchmark Dominance

The model excels in general knowledge assessments:

  • 86.50 on ArenaHard-V2 (2nd place overall)
  • 89.71 on MMLU (language understanding)
  • 90.44 on CEval (Chinese proficiency)

Open-Source Access

Meituan’s decision to open-source LongCat-Flash-Chat provides developers with unprecedented opportunities for research and application development.

Key Points:

  • 560B-parameter model with MoE architecture
  • 100 tokens/second inference speed
  • PID-controlled training for efficiency
  • Top-tier agent performance in benchmarks
  • Open-sourced for community development

Project GitHub | Demo Site

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Moondream 3.0 Outperforms GPT-5 and Claude 4 with Lean Architecture
News

Moondream 3.0 Outperforms GPT-5 and Claude 4 with Lean Architecture

Moondream 3.0, a lightweight Vision Language Model (VLM) with only 2B activated parameters, has surpassed industry giants GPT-5 and Claude 4 in benchmark tests. Its efficient Mixture of Experts (MoE) architecture and SigLIP visual encoder enable high-performance visual reasoning while maintaining deployment efficiency. The model excels in complex tasks like object detection, OCR, and structured output generation, making it ideal for edge computing and real-time applications.

September 28, 2025
Vision Language ModelsMixture of ExpertsAI Benchmarking
Moondream3.0 Outperforms GPT-5 in Benchmark Tests
News

Moondream3.0 Outperforms GPT-5 in Benchmark Tests

Moondream3.0, leveraging Mixture of Experts architecture, surpasses GPT-5 and Claude4 in benchmarks despite fewer parameters. Its SigLIP visual encoder and lightweight design excel in visual reasoning, OCR, and edge computing.

September 28, 2025
AI BenchmarkingMixture of ExpertsComputer Vision
Qwen3 Model Achieves 10x Speed Boost with Fewer Parameters
News

Qwen3 Model Achieves 10x Speed Boost with Fewer Parameters

Alibaba's Qwen team unveils a breakthrough AI model that delivers superior performance with only partially activated parameters. The Qwen3-Next-80B-A3B-Instruct model achieves 10x faster inference speeds while maintaining quality, thanks to innovative MoE architecture and optimized training approaches.

September 10, 2025
AI InnovationMachine LearningComputational Efficiency
Meta's DeepConf Cuts LLM Costs Without Sacrificing Accuracy
News

Meta's DeepConf Cuts LLM Costs Without Sacrificing Accuracy

Meta AI and UCSD introduce DeepConf, a breakthrough technology that optimizes large language model performance by dynamically filtering reasoning paths based on confidence metrics. The system reduces computational costs by up to 85% while maintaining high accuracy in complex tasks.

September 4, 2025
AI OptimizationLarge Language ModelsComputational Efficiency
New AI Framework Boosts Model Efficiency by 21% with "Slow-Fast" Thinking
News

New AI Framework Boosts Model Efficiency by 21% with "Slow-Fast" Thinking

Researchers have developed AlphaOne, an innovative framework that improves large language model efficiency by 21% through controlled "slow-then-fast" reasoning. The system optimizes computational resources while enhancing accuracy on complex tasks.

June 11, 2025
Artificial IntelligenceMachine LearningNatural Language Processing
Dongguan Pioneers China's First AI Large Model Center for Manufacturing
News

Dongguan Pioneers China's First AI Large Model Center for Manufacturing

Dongguan has launched China's first city-level AI large model center dedicated to manufacturing, aiming to integrate AI with industrial production. The center offers computing resources, open models, and AI engineering capabilities to enhance efficiency and intelligence in manufacturing processes.

March 20, 2025
AI Large ModelIntelligent ManufacturingHuawei