segformer b1 finetuned ade 512 512
nvidiaIntroduction
The SegFormer-B1 model is a vision model fine-tuned on the ADE20K dataset for semantic segmentation tasks. It was introduced in the paper "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers" by Xie et al. The model is available on Hugging Face and was developed by NVIDIA.
Architecture
SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head. The encoder is pre-trained on ImageNet-1k, and the decode head is added for fine-tuning on specific semantic segmentation tasks. This architecture is designed to achieve high performance on benchmarks like ADE20K and Cityscapes.
Training
The SegFormer model was fine-tuned on the ADE20K dataset at a resolution of 512x512. The training process involved pre-training the hierarchical Transformer on ImageNet-1k, followed by fine-tuning with the decode head on the downstream dataset.
Guide: Running Locally
-
Setup Environment: Ensure you have Python installed along with
transformers
andtorch
libraries.pip install transformers torch
-
Load the Model:
from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation from PIL import Image import requests feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b1-finetuned-ade-512-512") model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b1-finetuned-ade-512-512") url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) inputs = feature_extractor(images=image, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits
-
Inference: Use the model to process images and obtain segmentation outputs.
-
Cloud GPUs: For improved performance, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
The model is released under a custom license. Users should refer to the Hugging Face model card for specific licensing details.