segformer b1 finetuned ade 512 512

nvidia

Introduction

The SegFormer-B1 model is a vision model fine-tuned on the ADE20K dataset for semantic segmentation tasks. It was introduced in the paper "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers" by Xie et al. The model is available on Hugging Face and was developed by NVIDIA.

Architecture

SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head. The encoder is pre-trained on ImageNet-1k, and the decode head is added for fine-tuning on specific semantic segmentation tasks. This architecture is designed to achieve high performance on benchmarks like ADE20K and Cityscapes.

Training

The SegFormer model was fine-tuned on the ADE20K dataset at a resolution of 512x512. The training process involved pre-training the hierarchical Transformer on ImageNet-1k, followed by fine-tuning with the decode head on the downstream dataset.

Guide: Running Locally

  1. Setup Environment: Ensure you have Python installed along with transformers and torch libraries.

    pip install transformers torch
    
  2. Load the Model:

    from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
    from PIL import Image
    import requests
    
    feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b1-finetuned-ade-512-512")
    model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b1-finetuned-ade-512-512")
    
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    
    inputs = feature_extractor(images=image, return_tensors="pt")
    outputs = model(**inputs)
    logits = outputs.logits
    
  3. Inference: Use the model to process images and obtain segmentation outputs.

  4. Cloud GPUs: For improved performance, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

The model is released under a custom license. Users should refer to the Hugging Face model card for specific licensing details.

More Related APIs in Image Segmentation