Hunyuan Di T LLM Model — Open LLM List

Introduction

Hunyuan-DiT is a text-to-image diffusion transformer designed with a bilingual architecture for fine-grained understanding of both English and Chinese. The model incorporates a unique transformer structure, text encoder, and positional encoding to facilitate multi-round, multi-modal dialogue interactions. It has been evaluated to set new standards in Chinese-to-image generation.

Architecture

Hunyuan-DiT employs a diffusion model in the latent space, compressing images into low-dimensional latent spaces using a pre-trained Variational Autoencoder (VAE). The model is parameterized with a transformer and utilizes bilingual CLIP and multilingual T5 encoders for text prompts. It supports multi-turn text-to-image generation, allowing iterative and dynamic image creation based on user dialogue.

Training

The training incorporates a Multimodal Large Language Model (MLLM) to refine image captions and facilitate multi-round dialogue for image generation. The model sets a benchmark through a comprehensive evaluation protocol involving over 50 professional human evaluators.

Guide: Running Locally

Requirements

GPU: NVIDIA with CUDA support (V100/A100 recommended).
Memory: Minimum 11GB; 32GB recommended for optimal performance.
OS: Linux

Steps

Clone the Repository:

git clone https://github.com/tencent/HunyuanDiT
cd HunyuanDiT

Set Up Environment: Use Conda to create and activate the environment:

conda env create -f environment.yml
conda activate HunyuanDiT
python -m pip install -r requirements.txt

Optional Installation: For acceleration, install Flash Attention v2:

python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.1.2.post3

Download Pretrained Models: Install huggingface-cli and download models:

python -m pip install "huggingface_hub[cli]"
mkdir ckpts
huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts

Run Inference:

Using Gradio:
```
python app/hydit_app.py
```

Using Command Line:

python sample_t2i.py --prompt "渔舟唱晚"

Suggested Cloud GPUs

For enhanced performance, consider using cloud services like AWS EC2 or Google Cloud with V100 or A100 GPUs.

License

The Hunyuan-DiT is released under the Tencent Hunyuan Community License. More details can be found here.

More Related APIs