Cosmos 1.0 Diffusion 7 B Video2 World

nvidia

COSMOS-1.0-DIFFUSION-7B-VIDEO2WORLD

Introduction

Cosmos-1.0-Diffusion-7B-Video2World is part of NVIDIA's Cosmos diffusion models, designed to generate high-quality videos from text, image, or video inputs. These models are optimized for physical AI development, generating physics-aware videos and world states.

Architecture

The model is a diffusion transformer designed for video denoising in the latent space. It utilizes interleaved self-attention, cross-attention, and feedforward layers. Cross-attention layers allow conditioning on input text during denoising. Adaptive layer normalization embeds time information, and latent frames are concatenated with generated frames temporally. The model can handle augment noise on conditional latent frames to bridge training and inference gaps.

Training

The training architecture supports input types including text, image, and video, with specific requirements for format and resolution. The model generates video outputs as short scenes with adjustable aspect ratios and frame rates. It is compatible with NVIDIA hardware and operates primarily on Linux.

Guide: Running Locally

  1. Requirements: Ensure you have a compatible NVIDIA GPU (e.g., Blackwell, Hopper, Ampere). The model runs effectively on Linux.
  2. Setup: Clone the Cosmos repository and install required dependencies.
  3. Data Preparation: Prepare input data according to specified formats (text, image, video).
  4. Inference: Use the Cosmos runtime engine to execute the model, generating video outputs from your inputs.
  5. Hardware Recommendation: Utilize cloud GPUs (e.g., NVIDIA H100) for optimized performance and resource management, especially for large-scale deployments.

License

The model is released under the NVIDIA Open Model License, which allows commercial use and the creation of derivative models. NVIDIA does not claim ownership of outputs generated from the models. The license prohibits bypassing any technical limitations or safety mechanisms integrated into the model. For licensing inquiries, contact cosmos-license@nvidia.com.

More Related APIs