YOLOS (Tiny-Sized) Model

Introduction

YOLOS is a Vision Transformer (ViT) model fine-tuned on the COCO 2017 object detection dataset, introduced in the paper "You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection" by Fang et al. Despite its simplicity, a base-sized YOLOS model achieves 42 AP on COCO validation 2017, comparable to more complex frameworks.

Architecture

The model employs a "bipartite matching loss" to compare predicted classes and bounding boxes with ground truth annotations, using the Hungarian matching algorithm to optimize mapping. It utilizes cross-entropy for classes and a linear combination of L1 and generalized IoU loss for bounding boxes.

Training

YOLOS was pre-trained for 300 epochs on ImageNet-1k and fine-tuned for 300 epochs on the COCO dataset, which includes 118k annotated images for training and 5k for validation.

Guide: Running Locally

Install Required Libraries: Ensure transformers, torch, and PIL are installed.

Load the Model:

from transformers import YolosImageProcessor, YolosForObjectDetection
from PIL import Image
import torch
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model = YolosForObjectDetection.from_pretrained('hustvl/yolos-tiny')
image_processor = YolosImageProcessor.from_pretrained("hustvl/yolos-tiny")

inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# Process and display results
target_sizes = torch.tensor([image.size[::-1]])
results = image_processor.post_process_object_detection(outputs, threshold=0.9, target_sizes=target_sizes)[0]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(f"Detected {model.config.id2label[label.item()]} with confidence {round(score.item(), 3)} at location {box}")

Cloud GPUs: For improved performance, consider using cloud-based GPUs such as AWS EC2, GCP, or Azure.

License

The YOLOS model is released under the Apache 2.0 License.