mochi 1 preview LLM Model

Introduction

Mochi 1 is an advanced open-source video generation model developed by Genmo. It features high-fidelity motion and strong prompt adherence, representing a significant improvement in open video generation systems. The model is available under the Apache 2.0 license and can be freely experimented with on Genmo's playground.

Architecture

Mochi 1 is built on the Asymmetric Diffusion Transformer (AsymmDiT) architecture with 10 billion parameters, making it the largest video generative model publicly available. It includes an inference harness for efficient context parallel implementation. The model employs an asymmetric encoder-decoder structure called AsymmVAE for effective video compression.

AsymmVAE Model Specs

Params Count: 362M
Enc Base Channels: 64
Dec Base Channels: 128
Latent Dim: 12
Spatial Compression: 8x8
Temporal Compression: 6x

AsymmDiT Model Specs

Params Count: 10B
Num Layers: 48
Num Heads: 24
Visual Dim: 3072
Text Dim: 1536
Visual Tokens: 44520
Text Tokens: 256

Training

Mochi 1's training process involves using a single T5-XXL language model for encoding prompts, unlike many modern diffusion models that require multiple pretrained language models. The architecture's asymmetric design optimizes memory use while focusing on both text and visual tokens through multi-modal self-attention mechanisms.

Guide: Running Locally

To run Mochi 1 locally, follow these steps:

Installation: Clone the repository and set up a virtual environment using uv.

git clone https://github.com/genmoai/models
cd models
pip install uv
uv venv .venv
source .venv/bin/activate
uv pip install setuptools
uv pip install -e . --no-build-isolation

Download Weights: Use download_weights.py to download the model and decoder.
```
python3 ./scripts/download_weights.py <path_to_downloaded_directory>
```

Run the Model:

Start the Gradio UI:

python3 ./demos/gradio_ui.py --model_dir "<path_to_downloaded_directory>"

Or generate videos directly:

python3 ./demos/cli.py --model_dir "<path_to_downloaded_directory>"

Cloud GPUs: For optimal performance, it's recommended to use at least 1 H100 GPU, as running on a single GPU requires about 60GB VRAM.

License

Mochi 1 is released under the Apache 2.0 license, which is a permissive license allowing users to freely use, modify, and distribute the software.