Skip to main content

Meituan's New AI Model Sees and Hears Like Humans Do

Meituan Breaks New Ground With Unified AI Perception

Imagine an AI that doesn't just read text but sees images and hears speech with the same natural ease. That's exactly what Meituan has achieved with its newly released LongCat-Next model, marking a significant leap in how machines understand our world.

The Technology Behind the Breakthrough

At the heart of this innovation lies the DiNA architecture (Discrete Native Autoregressive), which treats every type of input - whether words, pictures, or sounds - as variations of the same basic building blocks. Here's what makes it special:

  • One System Fits All: Instead of separate mechanisms for different media types, LongCat-Next uses identical processing methods across the board
  • Dual Capabilities: The same mathematical approach allows the model to both interpret information and create new content seamlessly
  • Space-Saving Design: Their visual compression technique can shrink image data by 28 times without losing crucial details - particularly valuable for tasks like document analysis

Real-World Performance That Surprises Experts

LongCat-Next isn't just theoretically impressive - it's outperforming specialized models in practical tests:

  • Document Understanding: Beats dedicated visual models at extracting information from complex layouts and dense text
  • Math Skills: Scores an impressive 83.1 on visual math problem-solving tests
  • Voice Mimicry: Can generate speech in real-time while maintaining industry-leading text comprehension (scoring 86.80 on C-Eval benchmarks)

"What's remarkable," observes one industry analyst, "is how it challenges the assumption that converting continuous data like images into discrete tokens must sacrifice quality. These results prove otherwise."

Why This Matters for Future AI

The true significance lies in creating a universal language for AI perception. When machines can process visual and auditory information as naturally as they handle text, we're looking at:

  • More intuitive human-AI interactions
  • Smarter assistants that truly understand their environment
  • Systems capable of interpreting complex charts or diagrams without special programming

Meituan has made both the LongCat-Next model and its dNaViT tokenizer publicly available, giving developers powerful new tools to build AI that interacts with our physical world more naturally than ever before.

Key Points:

  • Native Multimodal Processing: First AI to treat vision, speech and text as equal inputs
  • Proven Performance: Outperforms specialized models in multiple benchmark tests
  • Open Access: Technology now available for developers to build upon

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Disney Meets AI: The Bionic Robot That Feels Alive

A former Disney Imagineer and Midjourney co-founder have teamed up to create Éloi, a revolutionary bionic robot that blurs the line between machine and companion. With Disney's magic touch and cutting-edge AI, this modular robot breathes, remembers, and even gets bored – challenging our very definition of what robots can be. It's not just technology; it's the beginning of a new relationship between humans and machines.

April 7, 2026
roboticsAI innovationhuman-machine interaction
News

Hollywood star Milla Jovovich stuns tech world with open-source AI memory breakthrough

Milla Jovovich, best known for her action-packed 'Resident Evil' role, has taken the AI world by surprise with her open-source memory system MemPalace. Developed with friend Ben Sigman, this locally-run system achieved a rare perfect score on industry benchmarks, outperforming many commercial products. Inspired by ancient Greek memory techniques, it organizes information like a mental palace while using cutting-edge compression technology. The project has quickly gained traction on GitHub, proving innovation can come from unexpected places.

April 7, 2026
AI innovationopen source technologyMilla Jovovich
Milla Jovovich swaps zombie slaying for AI innovation with groundbreaking memory system
News

Milla Jovovich swaps zombie slaying for AI innovation with groundbreaking memory system

Hollywood star Milla Jovovich has traded her Resident Evil combat boots for coding shoes, leading the development of MemPalace - an AI memory system that scored perfectly in industry benchmarks. Inspired by ancient Greek techniques, this open-source project organizes digital memories into a navigable 3D space, offering users unprecedented control over their AI interactions while maintaining complete privacy through local operation.

April 7, 2026
AI innovationMemory systemsOpen source
Stepfun's New Flash Model Delivers Lightning-Fast AI at Your Fingertips
News

Stepfun's New Flash Model Delivers Lightning-Fast AI at Your Fingertips

Stepfun has just rolled out its Step 3.5 Flash series, bringing lightning-fast AI responses to all Step Plan users. This optimized model cuts through delays with millisecond-level processing while maintaining impressive understanding capabilities. Perfect for mobile use and high-frequency interactions, it also shines in visual analysis and long-text processing. Developers get a bonus too - open API access makes it easier than ever to integrate this speedy AI into various applications.

April 2, 2026
AI innovationStepfunreal-time processing
Alibaba's New AI Image Model Brings Hyper-Realistic Faces and More
News

Alibaba's New AI Image Model Brings Hyper-Realistic Faces and More

Alibaba has unveiled Wan2.7-Image, a groundbreaking AI model that revolutionizes image generation. Gone are the days of generic 'AI faces' - this technology enables pixel-perfect facial customization down to bone structure and eye shape. It also masters artistic color transfer and can generate print-quality documents with complex formatting. With interactive editing features and multi-subject consistency, this tool is set to transform industries from e-commerce to entertainment.

April 1, 2026
AI image generationAlibabadigital content creation
Qwen3.5-Omni Ushers in a New Era of AI with Multimodal Mastery
News

Qwen3.5-Omni Ushers in a New Era of AI with Multimodal Mastery

Tongyi Lab's latest AI model, Qwen3.5-Omni, has set a new benchmark with 215 state-of-the-art achievements. This multimodal powerhouse seamlessly processes text, images, audio, and video, outperforming competitors like Gemini-3.1Pro in audio understanding while maintaining top-tier visual and text capabilities. Its innovative Hybrid-Attention MoE architecture enables processing of lengthy audio and video content with remarkable precision. From real-time voice control to personalized voice cloning, Qwen3.5-Omni is redefining how we interact with technology.

March 31, 2026
AI innovationmultimodal AIvoice technology