Cosmos 1.0 Autoregressive 4 B

nvidia

Introduction

COSMOS-1.0-Autoregressive-4B is part of NVIDIA's Cosmos suite of autoregressive world foundation models. These models are designed for generating physics-aware videos and world states, aiding physical AI development. They are suitable for commercial use under the NVIDIA Open Model License.

Architecture

The Cosmos-1.0-Autoregressive-4B model is an autoregressive transformer designed for world generation. It consists of interleaved self-attention and feedforward layers. The model accepts video inputs and generates future video frames based on these inputs.

Training

The specific training details are not provided in the documentation. However, the models are pre-trained and optimized for generating video sequences from video or image inputs. They are integrated with NVIDIA's Cosmos runtime engine and compatible with specific NVIDIA hardware architectures.

Guide: Running Locally

  1. Requirements:

    • Compatible hardware: NVIDIA Blackwell, Hopper, or Ampere architectures.
    • Operating System: Linux (not tested on other OS).
  2. Setup:

    • Clone the Cosmos repository from GitHub.
    • Ensure you have a compatible GPU, like NVIDIA H100, for optimal performance.
  3. Running the Model:

    • Use the Cosmos runtime engine for inference.
    • Input should be a video in mp4 format, with a resolution of 1024x640 and at least 9 frames.
  4. Cloud GPUs:

    • For better performance, consider using cloud GPUs like those available on AWS or Google Cloud.

License

The COSMOS-1.0-Autoregressive-4B is released under the NVIDIA Open Model License. This license allows commercial use and distribution of derivative models. NVIDIA does not claim ownership of outputs generated by these models.

More Related APIs