table detection and extraction
foduucomIntroduction
The YOLOv8s Table Detection model is designed for detecting tables, both bordered and borderless, within images using the YOLO (You Only Look Once) framework. It integrates with Optical Character Recognition (OCR) to extract data from detected tables, making it useful for processing unstructured documents.
Architecture
The model employs a modified CSPDarknet53 as its backbone, enhanced by self-attention mechanisms and feature pyramid networks. This architecture allows the model to accurately detect and classify tables of varying sizes, designs, and styles.
Training
The model is trained on a diverse dataset that includes images of bordered and borderless tables, covering a range of designs. Training involved extensive computation across multiple epochs, optimizing the model's weights to minimize detection loss. Performance metrics include an mAP@0.5 (box) of 0.962 overall, with 0.961 for bordered and 0.963 for borderless tables.
Guide: Running Locally
-
Install Required Packages:
pip install ultralyticsplus==0.0.28 ultralytics==8.0.43
-
Load Model and Perform Prediction:
from ultralyticsplus import YOLO, render_result # Load model model = YOLO('foduucom/table-detection-and-extraction') # Set model parameters model.overrides['conf'] = 0.25 # NMS confidence threshold model.overrides['iou'] = 0.45 # NMS IoU threshold model.overrides['agnostic_nms'] = False # NMS class-agnostic model.overrides['max_det'] = 1000 # Maximum number of detections per image # Set image path image = '/path/to/your/document/images' # Perform inference results = model.predict(image) # Display results print(results[0].boxes) render = render_result(model=model, image=image, result=results[0]) render.show()
-
Compute Infrastructure:
- Hardware: NVIDIA GeForce RTX 3060
- Software: Jupyter Notebook
Cloud GPUs: Consider using cloud providers like AWS, Google Cloud, or Azure for GPU resources if local hardware is insufficient.
License
For further inquiries or contributions, contact info@foduu.com.