deformable detr with box refine two stage

SenseTime

Introduction

The Deformable DETR with Box Refinement and Two Stage model is a state-of-the-art object detection model developed by SenseTime. It is built upon the Deformable DETR framework, utilizing a ResNet-50 backbone to enhance object detection capabilities. The model is trained using the COCO 2017 dataset and is compatible with PyTorch.

Architecture

The architecture of Deformable DETR consists of an encoder-decoder transformer model with a convolutional backbone. It features two heads on top of the decoder outputs: a linear layer for class labels and a multi-layer perceptron (MLP) for bounding boxes. The model employs object queries to detect specific objects within an image, with the number of queries set to 100 for the COCO dataset. A bipartite matching loss is used for training, incorporating the Hungarian matching algorithm to ensure optimal mapping between predicted and actual annotations.

Training

The model is trained on the COCO 2017 dataset, which includes 118k annotated images for training and 5k for validation. The training process utilizes standard cross-entropy loss for class prediction and a combination of L1 and generalized IoU loss for bounding box prediction.

Guide: Running Locally

To run the Deformable DETR model locally:

  1. Install Dependencies: Ensure you have PyTorch and the transformers library installed.
  2. Load the Model:
    from transformers import AutoImageProcessor, DeformableDetrForObjectDetection
    processor = AutoImageProcessor.from_pretrained("SenseTime/deformable-detr-with-box-refine-two-stage")
    model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr-with-box-refine-two-stage")
    
  3. Prepare Input Image:
    from PIL import Image
    import requests
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    
  4. Process Image and Make Predictions:
    inputs = processor(images=image, return_tensors="pt")
    outputs = model(**inputs)
    
  5. Post-Process and Display Results:
    import torch
    target_sizes = torch.tensor([image.size[::-1]])
    results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.7)[0]
    for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
        box = [round(i, 2) for i in box.tolist()]
        print(f"Detected {model.config.id2label[label.item()]} with confidence {round(score.item(), 3)} at location {box}")
    

To leverage cloud computing resources for enhanced performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.

License

The Deformable DETR model is released under the Apache 2.0 license, permitting use, modification, and distribution with proper attribution.

More Related APIs in Object Detection