detr resnet 101

facebook

Introduction

The DEtection TRansformer (DETR) model with ResNet-101 backbone is an end-to-end object detection model. It was introduced by Carion et al. in the paper "End-to-End Object Detection with Transformers". Developed by Meta AI, it is trained on the COCO 2017 dataset.

Architecture

DETR utilizes an encoder-decoder transformer architecture combined with a ResNet-101 convolutional backbone. The model adds two heads on top of the decoder: a linear layer for class labels and a multi-layer perceptron (MLP) for bounding boxes. It uses 100 object queries to detect objects and employs a bipartite matching loss to map predicted outputs to ground truth annotations optimally.

Training

The model was trained on the COCO 2017 dataset, which includes 118k training images. It was trained for 300 epochs using 16 V100 GPUs, with a batch size of 64, over a span of three days. The model achieved an average precision (AP) of 43.5 on the COCO 2017 validation dataset.

Guide: Running Locally

  1. Setup Environment: Ensure that Python and PyTorch are installed on your system. Install the transformers library from Hugging Face using pip install transformers.
  2. Load the Model: Use the following code to load and use the model for object detection:
    from transformers import DetrImageProcessor, DetrForObjectDetection
    import torch
    from PIL import Image
    import requests
    
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    
    processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-101", revision="no_timm")
    model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-101", revision="no_timm")
    
    inputs = processor(images=image, return_tensors="pt")
    outputs = model(**inputs)
    
    target_sizes = torch.tensor([image.size[::-1]])
    results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]
    
    for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
        box = [round(i, 2) for i in box.tolist()]
        print(
            f"Detected {model.config.id2label[label.item()]} with confidence "
            f"{round(score.item(), 3)} at location {box}"
        )
    
  3. GPUs for Training: To speed up training and inference, consider using cloud-based GPU services like AWS EC2 with NVIDIA V100 or A100 instances.

License

The DETR ResNet-101 model is licensed under the Apache 2.0 License.

More Related APIs in Object Detection