segformer b5 finetuned cityscapes 1024 1024

nvidia

Introduction

The SegFormer B5 model, fine-tuned on the Cityscapes dataset at a resolution of 1024x1024, is designed for semantic segmentation tasks. Introduced in the paper "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers" by Xie et al., this model is available in the Hugging Face repository and optimized for image analysis tasks.

Architecture

SegFormer comprises a hierarchical Transformer encoder paired with a lightweight all-MLP decode head. This architecture is initially pre-trained on ImageNet-1k. It is then fine-tuned with the decode head on specific downstream tasks such as the Cityscapes dataset, offering high performance on semantic segmentation benchmarks.

Training

The model is pre-trained on the ImageNet-1k dataset to initialize the hierarchical Transformer encoder. Subsequently, the model is fine-tuned on the Cityscapes dataset, which is a common practice to adapt it to specific tasks, improving its semantic segmentation capabilities.

Guide: Running Locally

To use the SegFormer model for image segmentation, follow these steps:

  1. Install the required packages:

    pip install transformers pillow requests torch
    
  2. Use the provided code to load the model and process an image:

    from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
    from PIL import Image
    import requests
    
    feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b5-finetuned-cityscapes-1024-1024")
    model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b5-finetuned-cityscapes-1024-1024")
    
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    
    inputs = feature_extractor(images=image, return_tensors="pt")
    outputs = model(**inputs)
    logits = outputs.logits
    
  3. For enhanced performance, especially with large datasets, consider using cloud-based GPU services such as AWS EC2 with GPU instances, Google Cloud Platform, or Azure.

License

The SegFormer model is distributed under a specific license that can be referenced here. Please review and comply with the license terms when using the model.

More Related APIs in Image Segmentation