TuSimple Unveils 'Ruyi' Image-to-Video Model and Ruyi-Mini-7B
date
Dec 18, 2024
damn
language
en
status
Published
type
News
image
https://www.ai-damn.com/1734492192521-6387004101035576494056768.png
slug
tusimple-unveils-ruyi-image-to-video-model-and-ruyi-mini-7b-1734492201490
tags
Image-to-Video
Ruyi
AI Technology
Animation
Gaming
summary
TuSimple Future Technology Co., Ltd. has launched its first large model, 'Ruyi', designed for transforming images into videos. The company also open-sourced the Ruyi-Mini-7B model, available for download on Hugging Face. Ruyi aims to enhance visual storytelling and is optimized for consumer-grade hardware, making it accessible for creators in animation and gaming.
TuSimple Unveils 'Ruyi' Image-to-Video Model and Ruyi-Mini-7B
Beijing, China — On December 17, 2024, TuSimple Future Technology Co., Ltd. officially announced the release of its first large model, Ruyi, as part of its TuSheng Video series. The company also open-sourced the Ruyi-Mini-7B version, which can be downloaded from the Hugging Face platform. Founded in 2015 and headquartered in San Diego, California, TuSimple focuses on applying AI technology across various industries, including animation, gaming, and transportation.
Features of the Ruyi Model
The Ruyi model is specifically designed to operate on consumer-grade graphics cards, providing users with detailed deployment instructions and workflows through ComfyUI, enabling quick setup and use. Its performance excels in frame consistency, motion fluidity, color representation, and composition, making it a promising tool for visual storytelling. Aiming to cater to anime and gaming enthusiasts, the model has undergone extensive training in these domains.
Ruyi supports multi-resolution and multi-duration video generation, capable of producing outputs ranging from 384×384 to 1024×1024 pixels, with any aspect ratio. Users can create videos of up to 120 frames or 5 seconds in length and have control over the generation of first frames and transitions between keyframes. The model also offers motion amplitude control and five types of shot control. Built on the DiT architecture, Ruyi comprises a Casual VAE module and a Diffusion Transformer, totaling approximately 7.1 billion parameters and was trained on around 200 million video clips.
Challenges and Future Improvements
Despite its advancements, Ruyi does face challenges, including issues with hand distortion, facial detail collapse in multi-person scenarios, and uncontrollable transitions. TuSimple is actively addressing these challenges to improve the model in future updates.
Looking ahead, TuSimple plans to maintain its focus on scene requirements and achieve breakthroughs in direct CUT generation. The company intends to offer two versions of the model in its next release, catering to the diverse needs of creators. By utilizing large models like Ruyi, TuSimple aims to reduce the development cycle and cost associated with creating anime and game content. The Ruyi model can already generate five seconds of footage by inputting keyframes or creating transitions between them, significantly expediting the development process.
Accessing Ruyi-Mini-7B
Developers and creators interested in exploring the Ruyi-Mini-7B model can access it via the following link:
Key Points
- TuSimple launched its first large model, 'Ruyi', for image-to-video transformation.
- The Ruyi model is compatible with consumer-grade hardware, promoting accessibility.
- Future updates will address existing challenges and introduce new features to enhance performance.