Hunyuan3 D 1
tencentIntroduction
Tencent's Hunyuan3D-1.0 is a unified framework designed for both text-to-3D and image-to-3D generation. It addresses challenges in existing 3D generative models such as slow generation and poor generalization by employing a two-stage approach. The methodology involves a multi-view diffusion model and a feed-forward reconstruction model, which together significantly enhance the speed and quality of 3D asset generation.
Architecture
Hunyuan3D-1.0 consists of two main stages. The first stage uses a multi-view diffusion model to generate detailed multi-view RGB images efficiently, which helps in capturing the 3D asset from various viewpoints. The second stage involves a feed-forward reconstruction model that reconstructs the 3D asset from these images, handling noise and inconsistencies effectively. The framework incorporates the Hunyuan-DiT text-to-image model, making it versatile for both text and image-conditioned 3D generation. The standard version of the model has three times more parameters than the lite version, balancing speed and quality.
Training
The training process leverages the multi-view images generated in the first stage to train a reconstruction network that can efficiently recover the 3D structure. The model supports bilingual inference in Chinese and English and offers both a lite and a standard version optimized for different computational resources.
Guide: Running Locally
-
Clone the Repository:
git clone https://github.com/tencent/Hunyuan3D-1 cd Hunyuan3D-1
-
Set Up Environment: Create a Conda environment and activate it:
conda create -n hunyuan3d-1 python=3.9 conda activate hunyuan3d-1
Install required packages:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 bash env_install.sh
-
Download Pretrained Models:
mkdir weights huggingface-cli download tencent/Hunyuan3D-1 --local-dir ./weights
-
Inference: For text-to-3D generation:
python3 main.py --text_prompt "a lovely rabbit" --save_folder ./outputs/test/ --max_faces_num 90000 --do_texture_mapping --do_render
For image-to-3D generation:
python3 main.py --image_prompt "/path/to/your/image" --save_folder ./outputs/test/ --max_faces_num 90000 --do_texture_mapping --do_render
-
Using Cloud GPUs: The process benefits significantly from high-performance GPUs such as NVIDIA A100. The lite model requires about 10 seconds, while the standard model needs 25 seconds for processing on an A100 GPU.
License
Hunyuan3D-1.0 is released under the Tencent Hunyuan Community license. For more details, see the license file here.