Introduction

Depth Pro is a cutting-edge model for zero-shot metric monocular depth estimation. It generates high-resolution depth maps with precise sharpness and detail, producing a 2.25-megapixel depth map in 0.3 seconds on standard GPUs. The model does not require metadata like camera intrinsics, relying on innovative techniques to achieve high metric accuracy and boundary precision.

Architecture

Depth Pro utilizes an efficient multi-scale vision transformer for dense prediction. The model is trained with a combination of real and synthetic datasets to ensure high metric accuracy and detailed boundary tracing. It also includes state-of-the-art focal length estimation from a single image.

Training

The training protocol for Depth Pro involves combining real and synthetic datasets to optimize for both metric accuracy and boundary precision. It incorporates dedicated evaluation metrics for boundary accuracy in the estimated depth maps.

Guide: Running Locally

  1. Setup Environment: Follow the steps in the code repository to set up your environment.
  2. Download Checkpoint: Use the huggingface-hub CLI to download the model checkpoint.
    pip install huggingface-hub
    huggingface-cli download --local-dir checkpoints apple/DepthPro
    
  3. Command Line Execution: Use the provided script for predictions on a single image.
    depth-pro-run -i ./data/example.jpg
    
  4. Python Execution: Load and preprocess images using Python.
    from PIL import Image
    import depth_pro
    
    model, transform = depth_pro.create_model_and_transforms()
    model.eval()
    
    image, _, f_px = depth_pro.load_rgb(image_path)
    image = transform(image)
    
    prediction = model.infer(image, f_px=f_px)
    depth = prediction["depth"]
    focallength_px = prediction["focallength_px"]
    

Suggested Cloud GPUs

For optimal performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure to run the model.

License

Depth Pro is released under the apple-ascl license. Please review the license terms for more information on usage and distribution rights.

More Related APIs in Depth Estimation