table detection and extraction

foduucom

Introduction

The YOLOv8s Table Detection model is designed for detecting tables, both bordered and borderless, within images using the YOLO (You Only Look Once) framework. It integrates with Optical Character Recognition (OCR) to extract data from detected tables, making it useful for processing unstructured documents.

Architecture

The model employs a modified CSPDarknet53 as its backbone, enhanced by self-attention mechanisms and feature pyramid networks. This architecture allows the model to accurately detect and classify tables of varying sizes, designs, and styles.

Training

The model is trained on a diverse dataset that includes images of bordered and borderless tables, covering a range of designs. Training involved extensive computation across multiple epochs, optimizing the model's weights to minimize detection loss. Performance metrics include an mAP@0.5 (box) of 0.962 overall, with 0.961 for bordered and 0.963 for borderless tables.

Guide: Running Locally

  1. Install Required Packages:

    pip install ultralyticsplus==0.0.28 ultralytics==8.0.43
    
  2. Load Model and Perform Prediction:

    from ultralyticsplus import YOLO, render_result
    
    # Load model
    model = YOLO('foduucom/table-detection-and-extraction')
    
    # Set model parameters
    model.overrides['conf'] = 0.25  # NMS confidence threshold
    model.overrides['iou'] = 0.45  # NMS IoU threshold
    model.overrides['agnostic_nms'] = False  # NMS class-agnostic
    model.overrides['max_det'] = 1000  # Maximum number of detections per image
    
    # Set image path
    image = '/path/to/your/document/images'
    
    # Perform inference
    results = model.predict(image)
    
    # Display results
    print(results[0].boxes)
    render = render_result(model=model, image=image, result=results[0])
    render.show()
    
  3. Compute Infrastructure:

    • Hardware: NVIDIA GeForce RTX 3060
    • Software: Jupyter Notebook

    Cloud GPUs: Consider using cloud providers like AWS, Google Cloud, or Azure for GPU resources if local hardware is insufficient.

License

For further inquiries or contributions, contact info@foduu.com.

More Related APIs in Object Detection