detr doc table detection
TahaDouajiIntroduction
The detr-doc-table-detection
is a model designed to detect both bordered and borderless tables in document images. It is based on the facebook/detr-resnet-50
architecture and is developed by Taha Douaji. This model is intended for object detection tasks, and it operates within the framework of PyTorch and Transformers.
Architecture
The model is built upon the facebook/detr-resnet-50
architecture, which leverages transformers for end-to-end object detection. It is designed to process images and identify table structures, making it suitable for document analysis tasks.
Training
The training of the detr-doc-table-detection
model was conducted using the ICDAR2019 Table Dataset. It focuses on identifying tables in various document formats. Environmental impact considerations, such as carbon emissions, can be estimated using tools like the Machine Learning Impact calculator.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Dependencies: Ensure you have Python and PyTorch installed. Additionally, install the Hugging Face Transformers library.
pip install torch transformers pillow
-
Load the Model: Use the provided Python code to load the model and processor.
-
Prepare an Image: Load an image containing tables using the PIL library.
-
Process the Image: Use the
DetrImageProcessor
to prepare the image for the model and obtain predictions. -
Interpret Results: The model output includes bounding boxes and confidence scores for detected tables. Filter results based on a confidence threshold (e.g., >0.9).
Suggested Cloud GPUs
For efficient processing, consider using cloud GPU services such as AWS EC2 with GPU instances, Google Cloud’s AI Platform, or Microsoft Azure’s GPU-optimized virtual machines.
License
The detr-doc-table-detection
model is released under the Apache-2.0 license, which allows for both personal and commercial use, modification, and distribution.