Mesh G P T preview
MarcusLorenIntroduction
MeshGPT is an innovative text-to-3D model that utilizes an autoencoder for tokenizing 3D models and a transformer to generate tokens. It is designed to translate text inputs into 3D meshes, leveraging the autoencoder to encode and decode 3D models effectively. This model is claimed to be the first published 3D model tokenizer.
Architecture
MeshGPT comprises two main components:
- Autoencoder (Tokenizer): Utilizes 50M parameters to tokenize 3D models.
- Transformer Model: Based on GPT-2 small, this model uses 184M parameters to generate tokens. It has a codebook size of 2048 due to hardware constraints.
Training
The model was trained on a limited dataset of 4,000 models with a maximum of 250 triangles, using a free-tier GPU from Kaggle. The dataset includes 800 text labels, with 3D models sourced from Objaverse, ShapeNet, and ModelNet40.
Guide: Running Locally
-
Installation:
pip install git+https://github.com/MarcusLoppe/meshgpt-pytorch.git
-
Usage:
import torch from meshgpt_pytorch import MeshAutoencoder, MeshTransformer, mesh_render device = "cuda" if torch.cuda.is_available() else "cpu" transformer = MeshTransformer.from_pretrained("MarcusLoren/MeshGPT-preview").to(device) output = transformer.generate(texts=['sofa', 'bed', 'computer screen'], temperature=0.0) mesh_render.save_rendering('./render.obj', output)
-
Suggested Cloud GPUs: Consider using a cloud GPU service like AWS EC2, Google Cloud, or Azure for better performance.
License
MeshGPT is licensed under the Apache-2.0 license.