Cosmos 1.0 Autoregressive 5 B Video2 World

nvidia

Introduction

The COSMOS-1.0-Autoregressive-5B-Video2World model by NVIDIA is a world foundation model designed for generating physics-aware videos and world states for physical AI development. It is part of the Cosmos Autoregressive series, which includes models capable of predicting video sequences from video or image inputs. These models are commercially usable and available under the NVIDIA Open Model License.

Architecture

The model is an autoregressive transformer that uses interleaved self-attention, cross-attention, and feedforward layers. Cross-attention layers enable the model to condition on input text during decoding. It accepts text, image, and video inputs and outputs video sequences. Input specifications include text descriptions, images, or videos, while outputs are video clips depicting the described scenes.

Training

The COSMOS models are pre-trained and optimized for generating video sequences based on given inputs. They have been evaluated on multiple configurations with varying failure rates. The models have been tested on NVIDIA hardware, specifically the Blackwell, Hopper, and Ampere architectures, using BF16 precision for inference.

Guide: Running Locally

  1. Installation: Clone the Cosmos repository from GitHub.
  2. Setup Environment: Install necessary dependencies as outlined in the repository's documentation.
  3. Prepare Data: Input should be prepared in the specified formats (text, image, video).
  4. Run Inference: Use the provided scripts to run the model. Ensure your system meets the GPU memory requirements.
  5. Hardware Recommendations: Utilize NVIDIA GPUs like the A100 or H100 for optimal performance. Cloud GPU services, such as AWS EC2 with NVIDIA GPUs, can be leveraged for computational needs.

License

The COSMOS-1.0-Autoregressive-5B-Video2World model is distributed under the NVIDIA Open Model License. This license allows for commercial use, derivative model creation, and distribution, while NVIDIA retains ownership of the model and its derivatives created by NVIDIA. The license emphasizes adherence to NVIDIA's Trustworthy AI terms and compliance with legal and regulatory requirements.

More Related APIs