maskformer swin base ade

facebook

Introduction

MaskFormer is a model designed for semantic segmentation, utilizing a unified approach for instance, semantic, and panoptic segmentation tasks. It employs a set of masks and corresponding labels, addressing these tasks similarly to instance segmentation. The model is trained on the ADE20k dataset using a Swin backbone.

Architecture

MaskFormer operates by predicting sets of masks and labels, treating instance, semantic, and panoptic segmentation tasks under the same framework. This uniform paradigm allows it to handle different segmentation tasks effectively.

Model Architecture

Training

The model was trained using the ADE20k dataset, which is designed for semantic segmentation tasks. It leverages a Swin backbone to enhance its segmentation capabilities. Further details on its training can be found in the original paper "Per-Pixel Classification is Not All You Need for Semantic Segmentation" (arXiv:2107.06278).

Guide: Running Locally

To use MaskFormer locally, follow these steps:

  1. Install Transformers Library:
    Ensure you have the transformers library installed:

    pip install transformers
    
  2. Load the Model and Feature Extractor:

    from transformers import MaskFormerFeatureExtractor, MaskFormerForInstanceSegmentation
    from PIL import Image
    import requests
    
    url = "https://huggingface.co/datasets/hf-internal-testing/fixtures_ade20k/resolve/main/ADE_val_00000001.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    feature_extractor = MaskFormerFeatureExtractor.from_pretrained("facebook/maskformer-swin-base-ade")
    inputs = feature_extractor(images=image, return_tensors="pt")
    
    model = MaskFormerForInstanceSegmentation.from_pretrained("facebook/maskformer-swin-base-ade")
    outputs = model(**inputs)
    
  3. Postprocess and Visualize:

    predicted_semantic_map = feature_extractor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
    
  4. Utilize Cloud GPUs:
    For enhanced performance, especially with large datasets, consider using cloud-based GPUs. Providers like AWS, GCP, or Azure offer scalable GPU resources.

License

The MaskFormer model is released under an "other" license. Refer to the Hugging Face model card for specific licensing terms.

More Related APIs in Image Segmentation