detr resnet 50
facebookDETR-ResNet-50 Model
Introduction
The DEtection TRansformer (DETR) model with a ResNet-50 backbone is designed for end-to-end object detection. It is trained on the COCO 2017 dataset and introduced in the paper "End-to-End Object Detection with Transformers" by Carion et al. This model card is provided by Hugging Face as the original developers did not include one.
Architecture
DETR uses an encoder-decoder transformer framework paired with a convolutional backbone. Two heads are added to the decoder outputs to perform object detection: a linear layer for class labels and a multi-layer perceptron (MLP) for bounding boxes. The model employs object queries to detect objects, using a bipartite matching loss and the Hungarian matching algorithm to map predictions to ground truth annotations.
Training
The model was trained on the COCO 2017 dataset, consisting of 118k annotated images. Training involved resizing and normalizing images, and was conducted over 300 epochs on 16 V100 GPUs, with a batch size of 64. The DETR model achieves an average precision (AP) of 42.0 on COCO 2017 validation.
Guide: Running Locally
- Install Dependencies: Ensure you have PyTorch and the Transformers library installed.
pip install torch transformers
- Download and Load Model:
from transformers import DetrImageProcessor, DetrForObjectDetection import torch from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm") model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50", revision="no_timm") inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs) target_sizes = torch.tensor([image.size[::-1]]) results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0] for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): box = [round(i, 2) for i in box.tolist()] print( f"Detected {model.config.id2label[label.item()]} with confidence " f"{round(score.item(), 3)} at location {box}" )
- Cloud GPU Suggestion: For larger workloads or faster processing, consider using cloud GPUs like AWS EC2 P3 instances or Google Cloud's TPU.
License
This model is licensed under the Apache-2.0 License, allowing for both commercial and non-commercial use.