deformable detr
SenseTimeIntroduction
The Deformable DETR model with a ResNet-50 backbone is an advanced object detection model. It was introduced in the paper "Deformable DETR: Deformable Transformers for End-to-End Object Detection" by Zhu et al. The model is trained on the COCO 2017 dataset and is designed to improve upon traditional transformer-based object detection methods by leveraging deformable attention mechanisms.
Architecture
The Deformable DETR model is built upon an encoder-decoder transformer architecture with a convolutional backbone. It features two additional heads on the decoder outputs: a linear layer for class labels and a multi-layer perceptron (MLP) for bounding boxes. The model utilizes object queries to detect objects in images, with a default of 100 object queries for the COCO dataset. It employs a bipartite matching loss with the Hungarian matching algorithm to optimize the model for object detection tasks.
Training
The model is trained on the COCO 2017 object detection dataset, which includes 118,000 annotated images for training and 5,000 for validation. The training process involves comparing the predicted classes and bounding boxes with the ground truth annotations, using cross-entropy and a linear combination of L1 and generalized IoU loss.
Guide: Running Locally
To run the Deformable DETR model locally, follow these steps:
-
Install Dependencies: Ensure you have Python installed along with
transformers
andtorch
libraries. -
Load the Model:
from transformers import AutoImageProcessor, DeformableDetrForObjectDetection import torch from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) processor = AutoImageProcessor.from_pretrained("SenseTime/deformable-detr") model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr") inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs) target_sizes = torch.tensor([image.size[::-1]]) results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.7)[0] for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): box = [round(i, 2) for i in box.tolist()] print( f"Detected {model.config.id2label[label.item()]} with confidence " f"{round(score.item(), 3)} at location {box}" )
-
Cloud GPU Resources: For efficient processing, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
The Deformable DETR model is licensed under the Apache License 2.0.