TuSimple Unveils 'Ruyi' Image-to-Video Model and Ruyi-Mini-7B

date

Dec 18, 2024

url

https://www.aibase.com/news/14024

damn

language

status

Published

type

News

image

https://www.ai-damn.com/1734492192521-6387004101035576494056768.png

slug

tusimple-unveils-ruyi-image-to-video-model-and-ruyi-mini-7b-1734492201490

TuSimple Unveils 'Ruyi' Image-to-Video Model and Ruyi-Mini-7B

Beijing, China — On December 17, 2024, TuSimple Future Technology Co., Ltd. officially announced the release of its first large model, Ruyi, as part of its TuSheng Video series. The company also open-sourced the Ruyi-Mini-7B version, which can be downloaded from the Hugging Face platform. Founded in 2015 and headquartered in San Diego, California, TuSimple focuses on applying AI technology across various industries, including animation, gaming, and transportation.

Features of the Ruyi Model

The Ruyi model is specifically designed to operate on consumer-grade graphics cards, providing users with detailed deployment instructions and workflows through ComfyUI, enabling quick setup and use. Its performance excels in frame consistency, motion fluidity, color representation, and composition, making it a promising tool for visual storytelling. Aiming to cater to anime and gaming enthusiasts, the model has undergone extensive training in these domains.

Ruyi supports multi-resolution and multi-duration video generation, capable of producing outputs ranging from 384×384 to 1024×1024 pixels, with any aspect ratio. Users can create videos of up to 120 frames or 5 seconds in length and have control over the generation of first frames and transitions between keyframes. The model also offers motion amplitude control and five types of shot control. Built on the DiT architecture, Ruyi comprises a Casual VAE module and a Diffusion Transformer, totaling approximately 7.1 billion parameters and was trained on around 200 million video clips.

Challenges and Future Improvements

Despite its advancements, Ruyi does face challenges, including issues with hand distortion, facial detail collapse in multi-person scenarios, and uncontrollable transitions. TuSimple is actively addressing these challenges to improve the model in future updates.

Looking ahead, TuSimple plans to maintain its focus on scene requirements and achieve breakthroughs in direct CUT generation. The company intends to offer two versions of the model in its next release, catering to the diverse needs of creators. By utilizing large models like Ruyi, TuSimple aims to reduce the development cycle and cost associated with creating anime and game content. The Ruyi model can already generate five seconds of footage by inputting keyframes or creating transitions between them, significantly expediting the development process.

Accessing Ruyi-Mini-7B

Developers and creators interested in exploring the Ruyi-Mini-7B model can access it via the following link:

Hugging Face Link

Key Points

TuSimple launched its first large model, 'Ruyi', for image-to-video transformation.

The Ruyi model is compatible with consumer-grade hardware, promoting accessibility.

Future updates will address existing challenges and introduce new features to enhance performance.