Cosmos 1.0 Diffusion 14 B Video2 World

nvidia

Cosmos-1.0-Diffusion-14B-Video2World

Introduction

The Cosmos-1.0-Diffusion-14B-Video2World is part of NVIDIA's Cosmos family of diffusion-based world foundation models. These models are designed to generate high-quality, physics-aware videos from text, image, or video inputs. They are suitable for applications in physical AI development and are available for commercial use under the NVIDIA Open Model License.

Architecture

Cosmos-1.0-Diffusion-14B-Video2World utilizes a diffusion transformer model for video denoising in the latent space. Its architecture includes interleaved self-attention, cross-attention, and feedforward layers. Cross-attention layers enable the model to condition on input text, while adaptive layer normalization embeds time information for denoising. The model integrates input images or videos by concatenating their latent frames with generated ones along the temporal dimension, adding noise to bridge training and inference gaps.

Training

The training process involves generating videos based on text, image, or video inputs. The model supports input formats like text (up to 300 words), images (1280x704 resolution), and videos (1280x704 resolution, 9 frames). Outputs are 5-second video clips at 1280x704 pixels and 24 fps, with adjustable aspect ratios and frame rates.

Guide: Running Locally

  1. Prerequisites:

    • Ensure Linux operating system compatibility.
    • Acquire compatible NVIDIA hardware (e.g., Blackwell, Hopper, Ampere microarchitectures).
  2. Installation:

    • Clone the Cosmos repository from GitHub: git clone https://github.com/NVIDIA/Cosmos
    • Install necessary dependencies as listed in the repository's documentation.
  3. Running the Model:

    • Prepare your input data in the required formats (text, image, video).
    • Follow scripts or commands provided in the Cosmos repository to perform inference.
  4. Hardware Recommendations:

    • For optimal performance, use NVIDIA GPUs such as the H100.
    • Consider cloud GPUs for resource-intensive tasks if local hardware is insufficient.

License

Cosmos-1.0-Diffusion-14B-Video2World is distributed under the NVIDIA Open Model License. The license allows commercial usage, creation, and distribution of derivative models. However, bypassing or disabling any technical limitations may result in termination of rights under the license. For custom licensing needs, contact cosmos-license@nvidia.com.

More Related APIs