Skip to main content

Xiaohongshu's Open-Source Multimodal Model Rivals Top AI

Xiaohongshu's Open-Source Multimodal Model Challenges Industry Leaders

Chinese social media platform Xiaohongshu has entered the AI arms race with the release of dots.vlm1, its first self-developed multimodal large model. The open-source system combines a 1.2B parameter NaViT visual encoder with the DeepSeek V3 large language model, achieving performance comparable to proprietary models like Google's Gemini2.5Pro.

Image

Native Architecture Breaks New Ground

The model's standout feature is its completely self-developed architecture, trained from scratch rather than fine-tuned from existing models. The NaViT encoder supports dynamic resolution processing, allowing superior handling of real-world image variability. Through dual supervision combining pure visual and text-visual training, the system demonstrates exceptional capability with non-standard content including:

  • Tables and charts
  • Mathematical formulas
  • Document structures

"We rebuilt our entire training pipeline," explained the Hi Lab team. "From data collection using our dots.ocr tool for PDF processing to manual rewriting of web-sourced text, every component was optimized for cross-modal understanding."

Benchmark Performance Analysis

In rigorous testing across international evaluation sets, dots.vlm1 shows remarkable results:

Benchmark Performance Level

The model particularly shines in complex analytical tasks, solving Olympiad-level math problems and demonstrating strong STEM reasoning capabilities. While trailing slightly in advanced textual reasoning, its mathematical and coding performance equals leading LLMs.

Image

Future Development Roadmap

The Hi Lab team outlined three key focus areas for future development:

  1. Data expansion: Scaling cross-modal training datasets
  2. Algorithm enhancement: Implementing reinforcement learning techniques
  3. Reasoning improvement: Boosting generalization capabilities

By open-sourcing dots.vlm1, Xiaohongshu aims to stimulate innovation in the multimodal AI space while establishing itself as a serious player in foundational model development.

Key Points:

  • First complete open-source multimodal model from Xiaohongshu
  • Native NaViT encoder handles dynamic resolution natively
  • Matches proprietary models in 6/8 benchmark categories
  • Exceptional performance on STEM and analytical tasks
  • Planned enhancements through RL and data scaling

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Xiaohongshu's AI Video Editor Lets Creators Speak Their Vision Into Reality

China's popular lifestyle platform Xiaohongshu is quietly developing OpenStoryline, an AI-powered video editor that could revolutionize content creation. The tool promises to transform verbal ideas into polished videos through conversational commands, potentially challenging rivals like ByteDance's Xiaoyunque. In a surprising move, Xiaohongshu hints at possible open-sourcing of the technology.

February 9, 2026
AI video editingXiaohongshucreator tools
News

Xiaohongshu's New AI Video Editor Lets You Chat Your Way to Creative Content

China's popular social platform Xiaohongshu is testing OpenStoryline, an innovative AI-powered video editing tool that responds to conversational commands. Currently in version 1.0.0, this creative assistant could potentially go open-source, making professional-grade video editing accessible to more users. The move signals Xiaohongshu's deeper push into short-form video creation tools.

February 9, 2026
AI-video-editingXiaohongshucreative-technology
Kling AI 3.0 Unleashed: Bringing Cinematic Magic Within Reach
News

Kling AI 3.0 Unleashed: Bringing Cinematic Magic Within Reach

Kling AI's latest 3.0 version transforms video creation with smart storyboarding and extended clips up to 15 seconds. The update introduces film-grade lighting tech for stunning 4K images and simplifies multi-image style blending. Currently available for Black Gold members, these tools promise to democratize professional-quality storytelling.

February 5, 2026
AI video generationcreative toolsdigital storytelling
Global AI Showdown: Chinese Models Rise While Overseas Giants Hold Lead
News

Global AI Showdown: Chinese Models Rise While Overseas Giants Hold Lead

The latest SuperCLUE rankings reveal fascinating shifts in the AI landscape. While Anthropic's Claude-Opus still leads Chinese-language capabilities, domestic models like Kimi and Qwen3 are making impressive gains, even topping specialized categories. What's particularly striking is how China's open-source ecosystem now dominates its segment - a testament to the country's growing AI prowess.

February 4, 2026
AI rankingsChinese techlarge language models
News

Xiaohongshu Tests Voice-Powered Q&A Feature That Blends AI With Real User Experiences

China's popular lifestyle platform Xiaohongshu is quietly testing an innovative voice Q&A feature that combines AI-generated summaries with authentic user experiences. Early testers report the tool transforms scattered community notes into concise answers while preserving real insights. This move signals Xiaohongshu's ambition to carve out a unique space in the competitive AI search landscape.

January 30, 2026
XiaohongshuVoice SearchAI Curation
Yuchu's New AI Model Gives Robots Common Sense
News

Yuchu's New AI Model Gives Robots Common Sense

Chinese tech firm Yuchu has open-sourced UnifoLM-VLA-0, a breakthrough AI model that helps humanoid robots understand physical interactions like humans do. Unlike typical AI that just processes text and images, this model grasps spatial relationships and real-world dynamics - enabling robots to handle complex tasks from picking up objects to resisting disturbances. Built on existing technology but trained with just 340 hours of robot data, it's already outperforming competitors in spatial reasoning tests.

January 30, 2026
AI roboticsopen-source AIhumanoid robots