Introduction

VFusion3D is a scalable 3D generative model designed to create 3D content from video diffusion models. It leverages a small amount of 3D data and a substantial volume of synthetic multi-view data. This model represents a significant advancement in 3D generative/reconstruction models aimed at establishing a 3D foundation.

Architecture

VFusion3D is a feed-forward model focusing on the conversion of video data into 3D generative models. It utilizes the transformers library and is designed with scalability in mind, making it suitable for various 3D operations including plane generation, mesh exportation, and video rendering.

Training

The model is trained using a combination of real 3D data and synthetic multi-view data, allowing it to learn from diverse perspectives and improve its generative capabilities. The training process is detailed in its accompanying paper, presented at the European Conference on Computer Vision (ECCV) 2024.

Guide: Running Locally

To run VFusion3D locally, follow these steps:

  1. Install Dependencies: Ensure you have the necessary libraries installed. You may need additional packages for specific features like mesh generation.

    pip install imageio[ffmpeg] PyMCubes trimesh rembg[gpu,cli] kiui
    
  2. Load the Model:

    import torch
    from transformers import AutoModel, AutoProcessor
    
    model = AutoModel.from_pretrained("jadechoghari/vfusion3d", trust_remote_code=True)
    processor = AutoProcessor.from_pretrained("jadechoghari/vfusion3d")
    
  3. Download and Preprocess an Image:

    import requests
    from PIL import Image
    from io import BytesIO
    
    image_url = 'https://sm.ign.com/ign_nordic/cover/a/avatar-gen/avatar-generations_prsz.jpg'
    response = requests.get(image_url)
    image = Image.open(BytesIO(response.content))
    image, source_camera = processor(image)
    
  4. Generate 3D Output: Run the model to produce 3D planes, and optionally export meshes or videos.

    output_planes = model(image, source_camera)
    output_planes, mesh_path = model(image, source_camera, export_mesh=True)
    output_planes, video_path = model(image, source_camera, export_video=True)
    

For optimal performance, consider using cloud GPUs like AWS EC2 instances or Google Cloud GPUs.

License

VFusion3D is primarily licensed under the Creative Commons BY-NC 2.0 license. Parts of the project, such as OpenLRM, are under the Apache License 2.0, and some components may be under NVIDIA's proprietary license. The model weights are also covered by the CC-BY-NC license.

More Related APIs in Image To 3d