Cosmos 1.0 Autoregressive 12 B

nvidia

Introduction

Cosmos-1.0-Autoregressive-12B is a model developed by NVIDIA, designed for generating physics-aware videos and world states, which are crucial for physical AI development. It is part of the Cosmos World Foundation Models, a suite of pre-trained models that enable the prediction and generation of video sequences from video or image inputs.

Architecture

The Cosmos-1.0-Autoregressive-12B model is an autoregressive transformer architecture. It utilizes interleaved self-attention and feedforward layers, which are essential in processing and generating video sequences. The model is designed to predict future frames from given video or image inputs, supporting up to 33-frame video extensions.

Training

Training details for the Cosmos-1.0-Autoregressive-12B model focus on its capability to predict video frames based on initial input frames. The model is compatible with NVIDIA's Blackwell, Hopper, and Ampere hardware microarchitectures and has been tested with BF16 precision on Linux systems.

Guide: Running Locally

  1. System Requirements: Ensure compatibility with NVIDIA Blackwell, Hopper, or Ampere architectures.
  2. Environment Setup: Install the required software, including the Cosmos runtime engine from GitHub.
  3. Download Model: Access the model from Hugging Face.
  4. Run Inference: Use the model to input a video or image sequence and generate future frames.
  5. Cloud GPUs: Consider using cloud GPUs such as NVIDIA's offerings for better performance and resource allocation.

License

The Cosmos-1.0-Autoregressive-12B model is released under the NVIDIA Open Model License, which permits commercial use, the creation of derivative models, and does not claim ownership of outputs. The full license agreement can be viewed here.

More Related APIs