Introduction

Stable Video 4D (SV4D) is a generative model developed by Stability AI designed to create novel-view videos from a single-view video input. This model builds upon Stable Video Diffusion (SVD) and Stable Video 3D (SV3D) to generate a 4D image matrix of an object.

Architecture

SV4D is a generative video-to-video model that produces a 5x8 image matrix from a single video view. The model generates 40 frames (5 video frames x 8 camera views) at 576x576 resolution by initially using SV3D to create an orbital video. This orbital video, along with the input video, serves as reference frames for 4D sampling.

Training

SV4D is trained using the Objaverse dataset, which includes renders available under the Open Data Commons Attribution License. The training process involves filtering objects based on license reviews and using an enhanced rendering method to improve the model's generalization capabilities.

Guide: Running Locally

  1. Clone the Repository: Obtain the code from the Stability AI GitHub repository.
  2. Install Dependencies: Ensure all required libraries and tools are installed.
  3. Prepare Data: Use the Objaverse dataset or your dataset, ensuring compatibility with the model's input requirements.
  4. Run the Model: Execute the provided scripts to generate novel-view videos from your input videos.
  5. Utilize Cloud GPUs: For optimal performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.

License

SV4D is available under the Stability AI Community License, allowing free use for research, non-commercial, and commercial purposes by individuals or organizations with annual revenue under $1,000,000. For those exceeding this revenue threshold, an Enterprise License is required, available directly from Stability AI. More details are available in the Community License.

More Related APIs