oneformer_coco_swin_large
shi-labsIntroduction
OneFormer is a universal image segmentation framework designed to perform semantic, instance, and panoptic segmentation using a single model architecture. Introduced by Jain et al. in the paper "OneFormer: One Transformer to Rule Universal Image Segmentation," this model outperforms specialized models by employing a task token to guide both training and inference.
Architecture
OneFormer utilizes a Swin Transformer backbone and is trained on the COCO dataset. The framework is unique in that it integrates task tokens to dynamically adjust its focus to different segmentation tasks. This allows it to serve multiple tasks without the need for separate models, making it efficient and versatile.
Training
The OneFormer model is trained on the COCO dataset, leveraging its multi-task capabilities to excel in various segmentation tasks. The training strategy involves using task tokens to condition the model, making it robust across semantic, instance, and panoptic segmentation.
Guide: Running Locally
To use OneFormer locally, follow these steps:
-
Install Dependencies: Ensure you have Python and PyTorch installed. You may also need the
transformers
library from Hugging Face.pip install transformers
-
Load the Model: Use the following Python code to set up the OneFormer model.
from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation from PIL import Image import requests url = "https://huggingface.co/datasets/shi-labs/oneformer_demo/blob/main/coco.jpeg" image = Image.open(requests.get(url, stream=True).raw) processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large") model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_coco_swin_large")
-
Perform Segmentation: Use the model to perform semantic, instance, and panoptic segmentation.
# Semantic Segmentation semantic_inputs = processor(images=image, task_inputs=["semantic"], return_tensors="pt") semantic_outputs = model(**semantic_inputs) predicted_semantic_map = processor.post_process_semantic_segmentation(semantic_outputs, target_sizes=[image.size[::-1]])[0] # Instance Segmentation instance_inputs = processor(images=image, task_inputs=["instance"], return_tensors="pt") instance_outputs = model(**instance_inputs) predicted_instance_map = processor.post_process_instance_segmentation(instance_outputs, target_sizes=[image.size[::-1]])[0]["segmentation"] # Panoptic Segmentation panoptic_inputs = processor(images=image, task_inputs=["panoptic"], return_tensors="pt") panoptic_outputs = model(**panoptic_inputs) predicted_panoptic_map = processor.post_process_panoptic_segmentation(panoptic_outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]
-
Cloud GPUs: For large-scale or intensive tasks, consider using cloud services such as AWS, GCP, or Azure for GPU support.
License
The OneFormer model is released under the MIT License, allowing for broad use and modification.