detr resnet 101
facebookIntroduction
The DEtection TRansformer (DETR) model with ResNet-101 backbone is an end-to-end object detection model. It was introduced by Carion et al. in the paper "End-to-End Object Detection with Transformers". Developed by Meta AI, it is trained on the COCO 2017 dataset.
Architecture
DETR utilizes an encoder-decoder transformer architecture combined with a ResNet-101 convolutional backbone. The model adds two heads on top of the decoder: a linear layer for class labels and a multi-layer perceptron (MLP) for bounding boxes. It uses 100 object queries to detect objects and employs a bipartite matching loss to map predicted outputs to ground truth annotations optimally.
Training
The model was trained on the COCO 2017 dataset, which includes 118k training images. It was trained for 300 epochs using 16 V100 GPUs, with a batch size of 64, over a span of three days. The model achieved an average precision (AP) of 43.5 on the COCO 2017 validation dataset.
Guide: Running Locally
- Setup Environment: Ensure that Python and PyTorch are installed on your system. Install the
transformers
library from Hugging Face usingpip install transformers
. - Load the Model: Use the following code to load and use the model for object detection:
from transformers import DetrImageProcessor, DetrForObjectDetection import torch from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-101", revision="no_timm") model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-101", revision="no_timm") inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs) target_sizes = torch.tensor([image.size[::-1]]) results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0] for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): box = [round(i, 2) for i in box.tolist()] print( f"Detected {model.config.id2label[label.item()]} with confidence " f"{round(score.item(), 3)} at location {box}" )
- GPUs for Training: To speed up training and inference, consider using cloud-based GPU services like AWS EC2 with NVIDIA V100 or A100 instances.
License
The DETR ResNet-101 model is licensed under the Apache 2.0 License.