mochi
calcuisMochi Model Overview
Introduction
The Mochi model is a text-to-video generative model developed by Calcuis on the Hugging Face platform. It leverages a quantized version of the T5XXL encoder with Mochi for efficient video generation from textual descriptions. This model is designed to be used with the ComfyUI interface and includes various setup options for ease of deployment.
Architecture
The Mochi model utilizes a GGUF quantized version of the T5XXL encoder, optimized to work with Mochi. It includes components like:
- Mochi_fp8.safetensors: A 10GB file for diffusion models.
- T5XXL_fp16-q4_0.gguf: A 2.9GB file for text encoders.
- Mochi_vae_scaled.safetensors: A 725MB file for VAE models.
Training
The base model for Mochi is derived from Genmo's Mochi-1-preview. The architecture is designed to facilitate efficient text-to-video generation by compressing and optimizing the encoder model using GGUF quantization techniques.
Guide: Running Locally
Setup
-
File Placement:
- Place
mochi_fp8.safetensors
in./ComfyUI/models/diffusion_models
. - Place
t5xxl_fp16-q4_0.gguf
in./ComfyUI/models/text_encoders
. - Place
mochi_vae_scaled.safetensors
in./ComfyUI/models/vae
.
- Place
-
Execution:
- Run the
.bat
file located in the main directory, which assumes the use of the GGUF-Comfy pack. - Drag the corresponding workflow JSON file into your browser.
- Run the
Workflows
- GGUF Encoder Workflow: Example workflow
- Safetensors Workflow: Example workflow
Cloud GPUs
For optimal performance, consider using cloud GPU services such as AWS EC2 with GPU instances, Google Cloud GPU, or Azure N-series instances.
License
The Mochi model is released under the Apache 2.0 License, which allows for wide usage and modification with appropriate credit.