Hunyuan3 D 1 LLM Model — Open LLM List

Introduction

Tencent's Hunyuan3D-1.0 is a unified framework designed for both text-to-3D and image-to-3D generation. It addresses challenges in existing 3D generative models such as slow generation and poor generalization by employing a two-stage approach. The methodology involves a multi-view diffusion model and a feed-forward reconstruction model, which together significantly enhance the speed and quality of 3D asset generation.

Architecture

Hunyuan3D-1.0 consists of two main stages. The first stage uses a multi-view diffusion model to generate detailed multi-view RGB images efficiently, which helps in capturing the 3D asset from various viewpoints. The second stage involves a feed-forward reconstruction model that reconstructs the 3D asset from these images, handling noise and inconsistencies effectively. The framework incorporates the Hunyuan-DiT text-to-image model, making it versatile for both text and image-conditioned 3D generation. The standard version of the model has three times more parameters than the lite version, balancing speed and quality.

Training

The training process leverages the multi-view images generated in the first stage to train a reconstruction network that can efficiently recover the 3D structure. The model supports bilingual inference in Chinese and English and offers both a lite and a standard version optimized for different computational resources.

Guide: Running Locally

Clone the Repository:

git clone https://github.com/tencent/Hunyuan3D-1
cd Hunyuan3D-1

Set Up Environment: Create a Conda environment and activate it:

conda create -n hunyuan3d-1 python=3.9
conda activate hunyuan3d-1

Install required packages:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
bash env_install.sh

Download Pretrained Models:

mkdir weights
huggingface-cli download tencent/Hunyuan3D-1 --local-dir ./weights

Inference: For text-to-3D generation:

python3 main.py --text_prompt "a lovely rabbit" --save_folder ./outputs/test/ --max_faces_num 90000 --do_texture_mapping --do_render

For image-to-3D generation:

python3 main.py --image_prompt "/path/to/your/image" --save_folder ./outputs/test/ --max_faces_num 90000 --do_texture_mapping --do_render

Using Cloud GPUs: The process benefits significantly from high-performance GPUs such as NVIDIA A100. The lite model requires about 10 seconds, while the standard model needs 25 seconds for processing on an A100 GPU.