Fast Hunyuan

FastVideo

Introduction

FastHunyuan is an accelerated version of the HunyuanVideo model, designed to generate high-quality videos efficiently. It leverages a diffusion process with only 6 steps for sampling, yielding an 8X speed increase compared to the original HunyuanVideo model which uses 50 steps.

Architecture

The model is a distilled variant of the HunyuanVideo model, developed by Hao AI Lab. It is optimized for speed and efficiency, maintaining video quality while reducing the computational load.

Training

FastHunyuan was trained using consistency distillation on the MixKit dataset. The training involved the following hyperparameters:

  • Batch size: 16
  • Resolution: 720x1280
  • Number of frames: 125
  • Training steps: 320
  • GPUs used: 32
  • Learning rate: 1e-6
  • Loss function: Huber

Guide: Running Locally

  1. Clone the Repository: Clone the FastVideo repository from GitHub here.
  2. Inference Setup: Follow the instructions in the README for setting up inference locally.
  3. Alternative Method: Use the official Hunyuan Video repository, setting the shift to 17, steps to 6, resolution to 720x1280x125, and CFG scale greater than 6.

Cloud GPUs Suggestion

For efficient local execution, you may consider using cloud GPU services. FastHunyuan supports inference on a single RTX 4090 GPU with only 20GB of VRAM using NF4 quantization. For Lora Finetune, you'll need a minimum of 40 GB GPU memory each for 2 GPUs or 30 GB with CPU offload.

License

FastHunyuan is distributed under the tencent-hunyuan-community license. For more details, refer to the license link.

More Related APIs in Text To Video