Fast Mochi diffusers
FastVideoFastMochi Diffusers
Introduction
FastMochi is an accelerated version of the Mochi model, designed for high-quality video sampling with reduced diffusion steps. It achieves approximately 8x speedup by using 8 diffusion steps compared to the original's 64 steps. Developed by Hao AI Lab, FastMochi is distributed under the Apache-2.0 license.
Architecture
FastMochi is a distilled version of the Mochi model, leveraging advanced pipeline configurations to optimize video generation efficiency. The architecture incorporates components such as the T5ModelFactory for text encoding and the DitModelFactory and DecoderModelFactory for model processing and video decoding, respectively. The pipeline supports multi-GPU setups, enhancing performance and scalability.
Training
FastMochi was trained using consistency distillation on the MixKit dataset. Key training parameters included:
- Batch size: 32
- Resolution: 480x848
- Number of frames: 169
- Training steps: 128
- GPUs used: 16
- Learning rate: 1e-6
- Loss function: Huber loss
Guide: Running Locally
To run FastMochi locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/hao-ai-lab/FastVideo.git
-
Run the Inference Script: Use the provided Python script in conjunction with compatible weights from Hugging Face.
-
Setup: Ensure you have the necessary dependencies installed and configure your environment for multi-GPU usage if applicable.
-
Execution: With the setup complete, execute the script to generate videos from text prompts.
For optimal performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure, which offer powerful GPUs capable of handling the intensive video generation tasks.
License
FastMochi is licensed under the Apache-2.0 License, allowing for free use, modification, and distribution of the software, provided that proper attribution is given and any modifications are documented.