AI DAMN/BAAI Unveils See3D: A Breakthrough in 3D Video Learning

BAAI Unveils See3D: A Breakthrough in 3D Video Learning

date
Dec 10, 2024
damn
language
en
status
Published
type
News
image
https://www.ai-damn.com/1733873400420-6386944048455373767015634.png
slug
baai-unveils-see3d-a-breakthrough-in-3d-video-learning-1733873437516
tags
See3D
3DGenerationModel
VideoLearning
ArtificialIntelligence
summary
The Beijing Academy of Artificial Intelligence has launched See3D, a revolutionary model capable of generating 3D images from unlabeled video data. This innovation utilizes visual cues instead of traditional camera parameters, enabling efficient learning and broader applications in 3D generation technology.

BAAI Unveils See3D: A Breakthrough in 3D Video Learning

 
The Beijing Academy of Artificial Intelligence (BAAI) has announced the launch of See3D, an innovative 3D generation model designed to learn from large-scale unlabeled internet videos. This technological advancement aligns with the concept of "See Video, Get 3D" and represents a significant step forward in the field of 3D learning and generation.
 

Technical Innovations of See3D

 
See3D distinguishes itself by not relying on traditional camera parameters. Instead, it utilizes visual conditioning techniques to generate camera-direction controllable and geometrically consistent multi-view images based solely on visual cues obtained from videos. This approach eliminates the necessity for costly 3D or camera annotations, streamlining the process of learning 3D priors from abundant internet video data.
 
The model supports various forms of generation including:
  • Text-to-3D generation
  • Single view to 3D
  • Sparse views to 3D
Additionally, it is capable of performing 3D editing and Gaussian rendering. BAAI has made the model, code, and a demo available as open-source resources, facilitating broader technical reference and experimentation.
Demonstrations of See3D's capabilities include:
  • Unlocking 3D interactive worlds
  • 3D reconstruction based on sparse images
  • Open-world 3D generation
  • 3D generation from single views
These features highlight the extensive applicability of See3D in various creative 3D applications, enabling users to engage with 3D environments more dynamically.
notion image
 

Motivation Behind the Development

 
The impetus for developing See3D arises from the challenges associated with traditional 3D data collection methods, which are often time-consuming and expensive. In contrast, videos provide a wealth of multi-view correlations and camera motion information, making them valuable for revealing intricate 3D structures.
 
The See3D team has constructed a comprehensive dataset to facilitate this process, comprising 16 million video clips and 320 million frames of images. This dataset, named WebVi3D, is pivotal in enabling the model to generate pure 2D visual signals by introducing time-dependent noise to masked video data. This method supports scalable multi-view diffusion model training, achieving 3D generation without relying on camera conditions.
 

Key Advantages of See3D

 
See3D offers several key advantages:
  • Data Scalability: Sourced from a vast array of internet videos, the training data significantly enhances the scale of the constructed multi-view dataset.
  • Camera Controllability: The model supports scene generation under complex camera trajectories, ensuring geometric consistency across frames.
  • Geometric Consistency: The model maintains geometric integrity when generating multi-view images, which is crucial for realistic 3D representations.
By expanding the scale of available datasets, See3D aims to provide new insights and methodologies for advancing 3D generation technology. The research team hopes this initiative will motivate the 3D research community to focus on large-scale unlabeled camera data, lowering the costs associated with 3D data collection and bridging gaps with existing closed-source 3D solutions.
 
Project Address: See3D Project
 
Key Points
  1. See3D can generate 3D images from unlabeled video data.
  1. The model eliminates the need for traditional camera parameters.
  1. It supports multiple forms of 3D generation and editing.
  1. The initiative aims to reduce costs in 3D data collection and promote research in the field.

© 2024 Summer Origin Tech

Powered by Nobelium