Mesh G P T preview

MarcusLoren

Introduction

MeshGPT is an innovative text-to-3D model that utilizes an autoencoder for tokenizing 3D models and a transformer to generate tokens. It is designed to translate text inputs into 3D meshes, leveraging the autoencoder to encode and decode 3D models effectively. This model is claimed to be the first published 3D model tokenizer.

Architecture

MeshGPT comprises two main components:

  • Autoencoder (Tokenizer): Utilizes 50M parameters to tokenize 3D models.
  • Transformer Model: Based on GPT-2 small, this model uses 184M parameters to generate tokens. It has a codebook size of 2048 due to hardware constraints.

Training

The model was trained on a limited dataset of 4,000 models with a maximum of 250 triangles, using a free-tier GPU from Kaggle. The dataset includes 800 text labels, with 3D models sourced from Objaverse, ShapeNet, and ModelNet40.

Guide: Running Locally

  1. Installation:

    pip install git+https://github.com/MarcusLoppe/meshgpt-pytorch.git
    
  2. Usage:

    import torch
    from meshgpt_pytorch import MeshAutoencoder, MeshTransformer, mesh_render
    
    device = "cuda" if torch.cuda.is_available() else "cpu"
    transformer = MeshTransformer.from_pretrained("MarcusLoren/MeshGPT-preview").to(device)
    
    output = transformer.generate(texts=['sofa', 'bed', 'computer screen'], temperature=0.0)
    mesh_render.save_rendering('./render.obj', output)
    
  3. Suggested Cloud GPUs: Consider using a cloud GPU service like AWS EC2, Google Cloud, or Azure for better performance.

License

MeshGPT is licensed under the Apache-2.0 license.

More Related APIs in Text To 3d