Hunyuan Video Prompt Rewrite
tencentIntroduction
HunyuanVideo is an open-source video foundation model developed to exhibit superior performance in video generation, comparable to leading closed-source models. It integrates comprehensive frameworks including data curation and image-video joint model training, facilitating large-scale training and inference. The model, trained with over 13 billion parameters, is the largest among open-source video generative models and aims to bridge the gap between closed and open-source video models.
Architecture
HunyuanVideo operates on a spatial-temporally compressed latent space using Causal 3D VAE for compression. Text prompts are encoded with a large language model and used as conditions alongside Gaussian noise to generate output latents, which are decoded into images or videos via the 3D VAE decoder. The architecture employs a "Dual-stream to Single-stream" hybrid model, using Transformer blocks for independent processing of video and text tokens before combining them for multimodal information fusion.
Training
HunyuanVideo employs CausalConv3D in its 3D VAE for compressing pixel-space videos and images, significantly reducing token numbers for subsequent diffusion models. The Multimodal Large Language Model (MLLM) serves as the text encoder, offering superior image-text alignment and detail description. The model also implements a prompt rewrite mechanism to adapt user prompts for model-preferred inputs, enhancing video generation quality.
Guide: Running Locally
- Clone the Repository: Clone the HunyuanVideo-PromptRewrite repository from Hugging Face.
- Install Dependencies: Ensure all required libraries and dependencies are installed.
- Download Model Weights: Obtain the model weights from the provided Hugging Face link.
- Run Inference: Use the Hunyuan-Large original code to deploy and infer the model locally.
For optimal performance, especially during training, access to cloud GPUs such as AWS EC2 instances, Google's TPU, or Azure's GPU services is recommended.
License
The HunyuanVideo-PromptRewrite model is released under the tencent-hunyuan-community license, which can be reviewed in the LICENSE file provided in the repository.