Cosmos 0.1 Tokenizer C I8x8

nvidia

Introduction

Cosmos Tokenizer is a suite of visual tokenizers designed by NVIDIA for efficiently compressing images and videos while maintaining high reconstruction quality. It serves as a foundational component for both diffusion-based and autoregressive models in image and video generation.

Architecture

The Cosmos Tokenizer employs a lightweight and efficient architecture featuring causal temporal convolution and attention layers to maintain the temporal order of video frames. The encoder and decoder are symmetrical, using a 2-level Haar wavelet transform to down-sample spatial and temporal dimensions, and an inverse wavelet transform for reconstruction. Continuous tokenizers use a vanilla autoencoder formulation, while discrete tokenizers use Finite-Scalar-Quantization for latent space quantization.

Training

The Cosmos Tokenizer supports both continuous and discrete tokenization for images and videos, providing various compression rates. It achieves spatial compression rates of 8x8 or 16x16 and temporal compression factors of up to 8x, outperforming state-of-the-art methods in terms of compression and speed.

Guide: Running Locally

Basic Steps

  1. Clone the Repository:

    git clone https://github.com/NVIDIA/Cosmos-Tokenizer.git
    cd Cosmos-Tokenizer
    
  2. Install Dependencies:

    pip3 install -r requirements.txt
    apt-get install -y ffmpeg
    
  3. Build Docker Image (Optional):

    docker build -t cosmos-docker -f Dockerfile .
    docker run --gpus all -it --rm -v /home/${USER}:/home/${USER} --workdir ${PWD} cosmos-docker /bin/bash
    
  4. Download Pre-trained Checkpoints:
    Use the Hugging Face Hub to download necessary models.

    from huggingface_hub import login, snapshot_download
    import os
    login(token=<YOUR-HF-TOKEN>, add_to_git_credential=True)
    model_names = ["Cosmos-Tokenizer-CI8x8", "Cosmos-Tokenizer-CI16x16", ...]
    for model_name in model_names:
        snapshot_download(repo_id="nvidia/" + model_name, local_dir="pretrained_ckpts/" + model_name)
    
  5. Run Inference:
    Use provided scripts to encode and decode images or videos.

Suggested Cloud GPUs

  • NVIDIA Ampere (e.g., A100)
  • NVIDIA Hopper (e.g., H100)

License

The Cosmos Tokenizer is distributed under the NVIDIA Open Model License. This license allows commercial use, distribution of derivative models, and does not claim ownership of the outputs generated using the models.

More Related APIs