Shengshu's Vidu Q1 Revolutionizes Video Production with AI
AI-Powered Video Generation Breakthrough
At the WAIC 2025 World Artificial Intelligence Conference, Shengshu Technology made waves with the launch of its "Reference Video" feature for the Vidu Q1 platform. This innovation marks a significant leap in video production technology, using algorithmic advancements to bypass traditional storyboarding processes.
Streamlined Production Workflow
The new feature allows creators to:
- Upload reference images of characters, props, and scenes
- Input text prompts describing desired actions
- Generate complete video content in one click
The process collapses the traditional "storyboarding → video generation → editing → final video" pipeline into a simplified "reference images → video generation → editing → final video" workflow.
Solving Commercialization Challenges
Vidu Q1 addresses a critical bottleneck in AI video generation: subject consistency. The system currently supports:
- Up to seven simultaneous subjects
- Consistent character representation across frames
- Complex multi-character interactions
Example: Inputting "Zhuge Liang discussing with Churchill and Napoleon in a meeting room" with appropriate reference images yields a coherent conversation scene between the historical figures.
Industrial Applications
CEO Lu Yihang highlighted diverse commercial use cases:
- Advertising campaigns
- Animation production
- Film/TV previsualization
- Cultural tourism experiences
- Educational content creation
The technology enables a fundamental shift from physical shooting to AI-powered digital creation.
Technical Architecture
Shengshu's approach combines:
- U-ViT architecture (Diffusion models + Transformer)
- Multimodal understanding capabilities
- Industrial-first optimization philosophy
"Industry clients care more about content quality than technical approaches," Lu noted, emphasizing practical applications over theoretical purity.
Expanding into Embodied Intelligence
The company recently partnered with Tsinghua University to launch the Vidar model, which:
- Connects video generation with robotic control
- Requires minimal training data
- Converts virtual videos into physical movements
This demonstrates the platform's potential beyond pure video creation.
Key Points:
- Eliminates traditional storyboarding requirements
- Maintains character consistency across complex scenes
- Supports up to seven simultaneous subjects
- Uses U-ViT architecture for industrial applications
- Expands into embodied intelligence through Vidar model