oneformer_coco_swin_large

shi-labs

Introduction

OneFormer is a universal image segmentation framework designed to perform semantic, instance, and panoptic segmentation using a single model architecture. Introduced by Jain et al. in the paper "OneFormer: One Transformer to Rule Universal Image Segmentation," this model outperforms specialized models by employing a task token to guide both training and inference.

Architecture

OneFormer utilizes a Swin Transformer backbone and is trained on the COCO dataset. The framework is unique in that it integrates task tokens to dynamically adjust its focus to different segmentation tasks. This allows it to serve multiple tasks without the need for separate models, making it efficient and versatile.

Training

The OneFormer model is trained on the COCO dataset, leveraging its multi-task capabilities to excel in various segmentation tasks. The training strategy involves using task tokens to condition the model, making it robust across semantic, instance, and panoptic segmentation.

Guide: Running Locally

To use OneFormer locally, follow these steps:

  1. Install Dependencies: Ensure you have Python and PyTorch installed. You may also need the transformers library from Hugging Face.

    pip install transformers
    
  2. Load the Model: Use the following Python code to set up the OneFormer model.

    from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
    from PIL import Image
    import requests
    
    url = "https://huggingface.co/datasets/shi-labs/oneformer_demo/blob/main/coco.jpeg"
    image = Image.open(requests.get(url, stream=True).raw)
    
    processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large")
    model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_coco_swin_large")
    
  3. Perform Segmentation: Use the model to perform semantic, instance, and panoptic segmentation.

    # Semantic Segmentation
    semantic_inputs = processor(images=image, task_inputs=["semantic"], return_tensors="pt")
    semantic_outputs = model(**semantic_inputs)
    predicted_semantic_map = processor.post_process_semantic_segmentation(semantic_outputs, target_sizes=[image.size[::-1]])[0]
    
    # Instance Segmentation
    instance_inputs = processor(images=image, task_inputs=["instance"], return_tensors="pt")
    instance_outputs = model(**instance_inputs)
    predicted_instance_map = processor.post_process_instance_segmentation(instance_outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]
    
    # Panoptic Segmentation
    panoptic_inputs = processor(images=image, task_inputs=["panoptic"], return_tensors="pt")
    panoptic_outputs = model(**panoptic_inputs)
    predicted_panoptic_map = processor.post_process_panoptic_segmentation(panoptic_outputs, target_sizes=[image.size[::-1]])[0]["segmentation"]
    
  4. Cloud GPUs: For large-scale or intensive tasks, consider using cloud services such as AWS, GCP, or Azure for GPU support.

License

The OneFormer model is released under the MIT License, allowing for broad use and modification.

More Related APIs in Image Segmentation