Hunyuan Di T
Tencent-HunyuanIntroduction
Hunyuan-DiT is a text-to-image diffusion transformer designed with a bilingual architecture for fine-grained understanding of both English and Chinese. The model incorporates a unique transformer structure, text encoder, and positional encoding to facilitate multi-round, multi-modal dialogue interactions. It has been evaluated to set new standards in Chinese-to-image generation.
Architecture
Hunyuan-DiT employs a diffusion model in the latent space, compressing images into low-dimensional latent spaces using a pre-trained Variational Autoencoder (VAE). The model is parameterized with a transformer and utilizes bilingual CLIP and multilingual T5 encoders for text prompts. It supports multi-turn text-to-image generation, allowing iterative and dynamic image creation based on user dialogue.
Training
The training incorporates a Multimodal Large Language Model (MLLM) to refine image captions and facilitate multi-round dialogue for image generation. The model sets a benchmark through a comprehensive evaluation protocol involving over 50 professional human evaluators.
Guide: Running Locally
Requirements
- GPU: NVIDIA with CUDA support (V100/A100 recommended).
- Memory: Minimum 11GB; 32GB recommended for optimal performance.
- OS: Linux
Steps
-
Clone the Repository:
git clone https://github.com/tencent/HunyuanDiT cd HunyuanDiT
-
Set Up Environment: Use Conda to create and activate the environment:
conda env create -f environment.yml conda activate HunyuanDiT python -m pip install -r requirements.txt
-
Optional Installation: For acceleration, install Flash Attention v2:
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.1.2.post3
-
Download Pretrained Models: Install
huggingface-cli
and download models:python -m pip install "huggingface_hub[cli]" mkdir ckpts huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts
-
Run Inference:
- Using Gradio:
python app/hydit_app.py
- Using Command Line:
python sample_t2i.py --prompt "渔舟唱晚"
- Using Gradio:
Suggested Cloud GPUs
For enhanced performance, consider using cloud services like AWS EC2 or Google Cloud with V100 or A100 GPUs.
License
The Hunyuan-DiT is released under the Tencent Hunyuan Community License. More details can be found here.