segformer b1 finetuned cityscapes 1024 1024
nvidiaIntroduction
The SegFormer-B1 model fine-tuned on the Cityscapes dataset is designed for semantic segmentation tasks. It utilizes a hierarchical Transformer encoder paired with a lightweight MLP decoder. Originally introduced in the paper "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers" by Xie et al., it has been implemented and fine-tuned to work effectively on datasets like ADE20K and Cityscapes.
Architecture
SegFormer features a hierarchical Transformer encoder, which is initially pre-trained on ImageNet-1k. Following this, a lightweight all-MLP decode head is added, and both components are fine-tuned together on specific datasets like Cityscapes. The architecture is optimized for simplicity and efficiency, yielding competitive results in semantic segmentation benchmarks.
Training
The model was pre-trained on the ImageNet-1k dataset. Subsequently, a decode head was integrated, and the entire network was fine-tuned on the Cityscapes dataset. This fine-tuning process allows the model to adapt to the specific characteristics and requirements of the semantic segmentation tasks within the Cityscapes dataset.
Guide: Running Locally
To run SegFormer locally, follow these steps:
- Setup Environment: Ensure you have Python installed, along with necessary libraries such as
transformers
,torch
, andPIL
. - Install Transformers:
pip install transformers
- Load the Model:
from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation from PIL import Image import requests feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b1-finetuned-cityscapes-1024-1024") model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b1-finetuned-cityscapes-1024-1024") url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) inputs = feature_extractor(images=image, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits
- Run Inference: Process the outputs to interpret the logits for segmentation results.
For optimal performance, consider using cloud GPUs such as AWS EC2 with NVIDIA GPUs or Google Cloud Platform's AI Platform.
License
The model is available under a specific license, detailed here. Users must review and comply with the license terms before utilizing the model.