deformable detr

SenseTime

Introduction

The Deformable DETR model with a ResNet-50 backbone is an advanced object detection model. It was introduced in the paper "Deformable DETR: Deformable Transformers for End-to-End Object Detection" by Zhu et al. The model is trained on the COCO 2017 dataset and is designed to improve upon traditional transformer-based object detection methods by leveraging deformable attention mechanisms.

Architecture

The Deformable DETR model is built upon an encoder-decoder transformer architecture with a convolutional backbone. It features two additional heads on the decoder outputs: a linear layer for class labels and a multi-layer perceptron (MLP) for bounding boxes. The model utilizes object queries to detect objects in images, with a default of 100 object queries for the COCO dataset. It employs a bipartite matching loss with the Hungarian matching algorithm to optimize the model for object detection tasks.

Training

The model is trained on the COCO 2017 object detection dataset, which includes 118,000 annotated images for training and 5,000 for validation. The training process involves comparing the predicted classes and bounding boxes with the ground truth annotations, using cross-entropy and a linear combination of L1 and generalized IoU loss.

Guide: Running Locally

To run the Deformable DETR model locally, follow these steps:

  1. Install Dependencies: Ensure you have Python installed along with transformers and torch libraries.

  2. Load the Model:

    from transformers import AutoImageProcessor, DeformableDetrForObjectDetection
    import torch
    from PIL import Image
    import requests
    
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    
    processor = AutoImageProcessor.from_pretrained("SenseTime/deformable-detr")
    model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr")
    
    inputs = processor(images=image, return_tensors="pt")
    outputs = model(**inputs)
    
    target_sizes = torch.tensor([image.size[::-1]])
    results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.7)[0]
    
    for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
        box = [round(i, 2) for i in box.tolist()]
        print(
            f"Detected {model.config.id2label[label.item()]} with confidence "
            f"{round(score.item(), 3)} at location {box}"
        )
    
  3. Cloud GPU Resources: For efficient processing, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

The Deformable DETR model is licensed under the Apache License 2.0.

More Related APIs in Object Detection