yolos small
hustvlIntroduction
YOLOS-Small is a model for object detection, fine-tuned on the COCO 2017 dataset. It is a Vision Transformer (ViT) model, trained using the DETR loss, and is capable of achieving significant performance despite its simplicity.
Architecture
The YOLOS model employs a Vision Transformer architecture, leveraging a "bipartite matching loss" for training. This involves comparing the predicted classes and bounding boxes with ground truth annotations, using the Hungarian matching algorithm for optimal one-to-one mapping. It uses cross-entropy for class optimization and a combination of L1 and generalized IoU loss for bounding boxes.
Training
YOLOS was initially pre-trained on the ImageNet-1k dataset for 200 epochs and subsequently fine-tuned on the COCO 2017 dataset for 150 epochs. The training involved 118k annotated images for COCO training and 5k images for validation, achieving an average precision (AP) of 36.1 on COCO 2017 validation.
Guide: Running Locally
To run the YOLOS model locally, you can use the following Python code snippet:
from transformers import YolosFeatureExtractor, YolosForObjectDetection
from PIL import Image
import requests
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = YolosFeatureExtractor.from_pretrained('hustvl/yolos-small')
model = YolosForObjectDetection.from_pretrained('hustvl/yolos-small')
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
bboxes = outputs.pred_boxes
This setup requires PyTorch. For optimal performance, especially with large datasets, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
The YOLOS-Small model is released under the Apache 2.0 license, which allows for both personal and commercial use, modification, and distribution.