segformer b5 finetuned cityscapes 1024 1024
nvidiaIntroduction
The SegFormer B5 model, fine-tuned on the Cityscapes dataset at a resolution of 1024x1024, is designed for semantic segmentation tasks. Introduced in the paper "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers" by Xie et al., this model is available in the Hugging Face repository and optimized for image analysis tasks.
Architecture
SegFormer comprises a hierarchical Transformer encoder paired with a lightweight all-MLP decode head. This architecture is initially pre-trained on ImageNet-1k. It is then fine-tuned with the decode head on specific downstream tasks such as the Cityscapes dataset, offering high performance on semantic segmentation benchmarks.
Training
The model is pre-trained on the ImageNet-1k dataset to initialize the hierarchical Transformer encoder. Subsequently, the model is fine-tuned on the Cityscapes dataset, which is a common practice to adapt it to specific tasks, improving its semantic segmentation capabilities.
Guide: Running Locally
To use the SegFormer model for image segmentation, follow these steps:
-
Install the required packages:
pip install transformers pillow requests torch
-
Use the provided code to load the model and process an image:
from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation from PIL import Image import requests feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b5-finetuned-cityscapes-1024-1024") model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b5-finetuned-cityscapes-1024-1024") url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) inputs = feature_extractor(images=image, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits
-
For enhanced performance, especially with large datasets, consider using cloud-based GPU services such as AWS EC2 with GPU instances, Google Cloud Platform, or Azure.
License
The SegFormer model is distributed under a specific license that can be referenced here. Please review and comply with the license terms when using the model.