detr resnet 50

facebook

DETR-ResNet-50 Model

Introduction

The DEtection TRansformer (DETR) model with a ResNet-50 backbone is designed for end-to-end object detection. It is trained on the COCO 2017 dataset and introduced in the paper "End-to-End Object Detection with Transformers" by Carion et al. This model card is provided by Hugging Face as the original developers did not include one.

Architecture

DETR uses an encoder-decoder transformer framework paired with a convolutional backbone. Two heads are added to the decoder outputs to perform object detection: a linear layer for class labels and a multi-layer perceptron (MLP) for bounding boxes. The model employs object queries to detect objects, using a bipartite matching loss and the Hungarian matching algorithm to map predictions to ground truth annotations.

Training

The model was trained on the COCO 2017 dataset, consisting of 118k annotated images. Training involved resizing and normalizing images, and was conducted over 300 epochs on 16 V100 GPUs, with a batch size of 64. The DETR model achieves an average precision (AP) of 42.0 on COCO 2017 validation.

Guide: Running Locally

  1. Install Dependencies: Ensure you have PyTorch and the Transformers library installed.
    pip install torch transformers
    
  2. Download and Load Model:
    from transformers import DetrImageProcessor, DetrForObjectDetection
    import torch
    from PIL import Image
    import requests
    
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    
    processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
    model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
    
    inputs = processor(images=image, return_tensors="pt")
    outputs = model(**inputs)
    
    target_sizes = torch.tensor([image.size[::-1]])
    results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]
    
    for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
        box = [round(i, 2) for i in box.tolist()]
        print(
                f"Detected {model.config.id2label[label.item()]} with confidence "
                f"{round(score.item(), 3)} at location {box}"
        )
    
  3. Cloud GPU Suggestion: For larger workloads or faster processing, consider using cloud GPUs like AWS EC2 P3 instances or Google Cloud's TPU.

License

This model is licensed under the Apache-2.0 License, allowing for both commercial and non-commercial use.

More Related APIs in Object Detection