SHAP-E

Introduction

Shap-E is a conditional generative model for 3D assets, designed to generate parameters of implicit functions that can be rendered as textured meshes and neural radiance fields. The model is trained through a two-stage process: an encoder maps 3D assets into implicit function parameters, followed by a conditional diffusion model trained on encoder outputs. This approach allows for rapid generation of complex and diverse 3D assets, offering improved convergence and sample quality over previous models like Point-E. The model's weights, inference code, and samples are available for public use.

Architecture

Shap-E employs a diffusion process capable of generating 3D images from text prompts. It utilizes a unique approach by generating implicit function parameters, which can be rendered in various formats. The model is built upon a two-stage training architecture involving an encoder and a diffusion model, optimizing the generation of multi-representation output spaces efficiently.

Training

The training process involves two key stages:

  1. Encoder Training: A deterministic encoder maps 3D assets into implicit function parameters.
  2. Diffusion Model Training: A conditional diffusion model is trained on encoder outputs, leveraging a large dataset of paired 3D and text data for effective learning.

Details regarding training procedures can be found in the original Shap-E paper.

Guide: Running Locally

  1. Install Dependencies:

    pip install transformers accelerate -q
    pip install git+https://github.com/huggingface/diffusers@@shap-ee
    
  2. Run the Model:

    import torch
    from diffusers import ShapEPipeline
    from diffusers.utils import export_to_gif
    
    ckpt_id = "openai/shap-e"
    pipe = ShapEPipeline.from_pretrained(repo).to("cuda")
    
    guidance_scale = 15.0
    prompt = "a shark"
    images = pipe(
        prompt,
        guidance_scale=guidance_scale,
        num_inference_steps=64,
        size=256,
    ).images
    
    gif_path = export_to_gif(images, "shark_3d.gif")
    
  3. Hardware Requirements:

    • For optimal performance, using a cloud GPU service (e.g., AWS EC2, Google Cloud) is recommended.

License

Shap-E is released under the MIT license, allowing for broad usage and modification. For more details, refer to the license documentation.

More Related APIs in Text To 3d