deformable detr with box refine two stage
SenseTimeIntroduction
The Deformable DETR with Box Refinement and Two Stage model is a state-of-the-art object detection model developed by SenseTime. It is built upon the Deformable DETR framework, utilizing a ResNet-50 backbone to enhance object detection capabilities. The model is trained using the COCO 2017 dataset and is compatible with PyTorch.
Architecture
The architecture of Deformable DETR consists of an encoder-decoder transformer model with a convolutional backbone. It features two heads on top of the decoder outputs: a linear layer for class labels and a multi-layer perceptron (MLP) for bounding boxes. The model employs object queries to detect specific objects within an image, with the number of queries set to 100 for the COCO dataset. A bipartite matching loss is used for training, incorporating the Hungarian matching algorithm to ensure optimal mapping between predicted and actual annotations.
Training
The model is trained on the COCO 2017 dataset, which includes 118k annotated images for training and 5k for validation. The training process utilizes standard cross-entropy loss for class prediction and a combination of L1 and generalized IoU loss for bounding box prediction.
Guide: Running Locally
To run the Deformable DETR model locally:
- Install Dependencies: Ensure you have PyTorch and the
transformers
library installed. - Load the Model:
from transformers import AutoImageProcessor, DeformableDetrForObjectDetection processor = AutoImageProcessor.from_pretrained("SenseTime/deformable-detr-with-box-refine-two-stage") model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr-with-box-refine-two-stage")
- Prepare Input Image:
from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw)
- Process Image and Make Predictions:
inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs)
- Post-Process and Display Results:
import torch target_sizes = torch.tensor([image.size[::-1]]) results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.7)[0] for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): box = [round(i, 2) for i in box.tolist()] print(f"Detected {model.config.id2label[label.item()]} with confidence {round(score.item(), 3)} at location {box}")
To leverage cloud computing resources for enhanced performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.
License
The Deformable DETR model is released under the Apache 2.0 license, permitting use, modification, and distribution with proper attribution.