Depth Pro
appleIntroduction
Depth Pro is a cutting-edge model for zero-shot metric monocular depth estimation. It generates high-resolution depth maps with precise sharpness and detail, producing a 2.25-megapixel depth map in 0.3 seconds on standard GPUs. The model does not require metadata like camera intrinsics, relying on innovative techniques to achieve high metric accuracy and boundary precision.
Architecture
Depth Pro utilizes an efficient multi-scale vision transformer for dense prediction. The model is trained with a combination of real and synthetic datasets to ensure high metric accuracy and detailed boundary tracing. It also includes state-of-the-art focal length estimation from a single image.
Training
The training protocol for Depth Pro involves combining real and synthetic datasets to optimize for both metric accuracy and boundary precision. It incorporates dedicated evaluation metrics for boundary accuracy in the estimated depth maps.
Guide: Running Locally
- Setup Environment: Follow the steps in the code repository to set up your environment.
- Download Checkpoint: Use the
huggingface-hub
CLI to download the model checkpoint.pip install huggingface-hub huggingface-cli download --local-dir checkpoints apple/DepthPro
- Command Line Execution: Use the provided script for predictions on a single image.
depth-pro-run -i ./data/example.jpg
- Python Execution: Load and preprocess images using Python.
from PIL import Image import depth_pro model, transform = depth_pro.create_model_and_transforms() model.eval() image, _, f_px = depth_pro.load_rgb(image_path) image = transform(image) prediction = model.infer(image, f_px=f_px) depth = prediction["depth"] focallength_px = prediction["focallength_px"]
Suggested Cloud GPUs
For optimal performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure to run the model.
License
Depth Pro is released under the apple-ascl
license. Please review the license terms for more information on usage and distribution rights.