Hunyuan3 D 1

tencent

Introduction

Tencent's Hunyuan3D-1.0 is a unified framework designed for both text-to-3D and image-to-3D generation. It addresses challenges in existing 3D generative models such as slow generation and poor generalization by employing a two-stage approach. The methodology involves a multi-view diffusion model and a feed-forward reconstruction model, which together significantly enhance the speed and quality of 3D asset generation.

Architecture

Hunyuan3D-1.0 consists of two main stages. The first stage uses a multi-view diffusion model to generate detailed multi-view RGB images efficiently, which helps in capturing the 3D asset from various viewpoints. The second stage involves a feed-forward reconstruction model that reconstructs the 3D asset from these images, handling noise and inconsistencies effectively. The framework incorporates the Hunyuan-DiT text-to-image model, making it versatile for both text and image-conditioned 3D generation. The standard version of the model has three times more parameters than the lite version, balancing speed and quality.

Training

The training process leverages the multi-view images generated in the first stage to train a reconstruction network that can efficiently recover the 3D structure. The model supports bilingual inference in Chinese and English and offers both a lite and a standard version optimized for different computational resources.

Guide: Running Locally

  1. Clone the Repository:

    git clone https://github.com/tencent/Hunyuan3D-1
    cd Hunyuan3D-1
    
  2. Set Up Environment: Create a Conda environment and activate it:

    conda create -n hunyuan3d-1 python=3.9
    conda activate hunyuan3d-1
    

    Install required packages:

    pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
    bash env_install.sh
    
  3. Download Pretrained Models:

    mkdir weights
    huggingface-cli download tencent/Hunyuan3D-1 --local-dir ./weights
    
  4. Inference: For text-to-3D generation:

    python3 main.py --text_prompt "a lovely rabbit" --save_folder ./outputs/test/ --max_faces_num 90000 --do_texture_mapping --do_render
    

    For image-to-3D generation:

    python3 main.py --image_prompt "/path/to/your/image" --save_folder ./outputs/test/ --max_faces_num 90000 --do_texture_mapping --do_render
    
  5. Using Cloud GPUs: The process benefits significantly from high-performance GPUs such as NVIDIA A100. The lite model requires about 10 seconds, while the standard model needs 25 seconds for processing on an A100 GPU.

License

Hunyuan3D-1.0 is released under the Tencent Hunyuan Community license. For more details, see the license file here.

More Related APIs in Text To 3d