Cosmos 1.0 Diffusion 7 B Decoder D V8x16x16 To C V8x8x8
nvidiaIntroduction
Cosmos-1.0-Diffusion-7B-Decoder is a diffusion decoder model developed by NVIDIA, designed to enhance outputs from Cosmos-1.0-Autoregressive models with finer details. This model is suitable for commercial use and is part of the Cosmos platform aimed at video denoising and generation.
Architecture
The model is structured as a diffusion transformer for video denoising in latent space, integrating self-attention, cross-attention, and feedforward layers. Cross-attention layers enable conditioning on input text during denoising, and adaptive layer normalization embeds time information for enhanced performance.
Training
The model accepts integer tensors as input tokens generated by Cosmos tokenizers or autoregressive models, and outputs float tensors as continuous-valued feature vectors. It is compatible with NVIDIA hardware architectures like Blackwell, Hopper, and Ampere, and operates on Linux with inference tested on BF16 precision.
Guide: Running Locally
- Set up Environment: Ensure Linux OS and compatible NVIDIA hardware (Blackwell, Hopper, or Ampere) are available.
- Install Dependencies: Use the Cosmos runtime engine from NVIDIA Cosmos GitHub.
- Model Setup: Download the model from Hugging Face.
- Run Inference: Use BF16 precision for inference tasks.
- Consider Cloud GPUs: For enhanced performance, consider using cloud-based NVIDIA GPUs.
License
The model is released under the NVIDIA Open Model License. Users can commercially utilize the model, create and distribute derivative works, with NVIDIA not claiming ownership of generated outputs. Compliance with safety and ethical guidelines is required, and violation of technical restrictions may terminate the license.