mochi 1 preview
genmoIntroduction
Mochi 1 is an advanced open-source video generation model developed by Genmo. It features high-fidelity motion and strong prompt adherence, representing a significant improvement in open video generation systems. The model is available under the Apache 2.0 license and can be freely experimented with on Genmo's playground.
Architecture
Mochi 1 is built on the Asymmetric Diffusion Transformer (AsymmDiT) architecture with 10 billion parameters, making it the largest video generative model publicly available. It includes an inference harness for efficient context parallel implementation. The model employs an asymmetric encoder-decoder structure called AsymmVAE for effective video compression.
AsymmVAE Model Specs
- Params Count: 362M
- Enc Base Channels: 64
- Dec Base Channels: 128
- Latent Dim: 12
- Spatial Compression: 8x8
- Temporal Compression: 6x
AsymmDiT Model Specs
- Params Count: 10B
- Num Layers: 48
- Num Heads: 24
- Visual Dim: 3072
- Text Dim: 1536
- Visual Tokens: 44520
- Text Tokens: 256
Training
Mochi 1's training process involves using a single T5-XXL language model for encoding prompts, unlike many modern diffusion models that require multiple pretrained language models. The architecture's asymmetric design optimizes memory use while focusing on both text and visual tokens through multi-modal self-attention mechanisms.
Guide: Running Locally
To run Mochi 1 locally, follow these steps:
-
Installation: Clone the repository and set up a virtual environment using
uv
.git clone https://github.com/genmoai/models cd models pip install uv uv venv .venv source .venv/bin/activate uv pip install setuptools uv pip install -e . --no-build-isolation
-
Download Weights: Use
download_weights.py
to download the model and decoder.python3 ./scripts/download_weights.py <path_to_downloaded_directory>
-
Run the Model:
- Start the Gradio UI:
python3 ./demos/gradio_ui.py --model_dir "<path_to_downloaded_directory>"
- Or generate videos directly:
python3 ./demos/cli.py --model_dir "<path_to_downloaded_directory>"
- Start the Gradio UI:
-
Cloud GPUs: For optimal performance, it's recommended to use at least 1 H100 GPU, as running on a single GPU requires about 60GB VRAM.
License
Mochi 1 is released under the Apache 2.0 license, which is a permissive license allowing users to freely use, modify, and distribute the software.