yolos tiny

hustvl

YOLOS (Tiny-Sized) Model

Introduction

YOLOS is a Vision Transformer (ViT) model fine-tuned on the COCO 2017 object detection dataset, introduced in the paper "You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection" by Fang et al. Despite its simplicity, a base-sized YOLOS model achieves 42 AP on COCO validation 2017, comparable to more complex frameworks.

Architecture

The model employs a "bipartite matching loss" to compare predicted classes and bounding boxes with ground truth annotations, using the Hungarian matching algorithm to optimize mapping. It utilizes cross-entropy for classes and a linear combination of L1 and generalized IoU loss for bounding boxes.

Training

YOLOS was pre-trained for 300 epochs on ImageNet-1k and fine-tuned for 300 epochs on the COCO dataset, which includes 118k annotated images for training and 5k for validation.

Guide: Running Locally

  1. Install Required Libraries: Ensure transformers, torch, and PIL are installed.
  2. Load the Model:
    from transformers import YolosImageProcessor, YolosForObjectDetection
    from PIL import Image
    import torch
    import requests
    
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    
    model = YolosForObjectDetection.from_pretrained('hustvl/yolos-tiny')
    image_processor = YolosImageProcessor.from_pretrained("hustvl/yolos-tiny")
    
    inputs = image_processor(images=image, return_tensors="pt")
    outputs = model(**inputs)
    
    # Process and display results
    target_sizes = torch.tensor([image.size[::-1]])
    results = image_processor.post_process_object_detection(outputs, threshold=0.9, target_sizes=target_sizes)[0]
    for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
        box = [round(i, 2) for i in box.tolist()]
        print(f"Detected {model.config.id2label[label.item()]} with confidence {round(score.item(), 3)} at location {box}")
    
  3. Cloud GPUs: For improved performance, consider using cloud-based GPUs such as AWS EC2, GCP, or Azure.

License

The YOLOS model is released under the Apache 2.0 License.

More Related APIs in Object Detection