web form ui field detection

foduucom

Introduction

The web-form-Detect model is a YOLOv8 object detection model designed to detect and locate UI form fields in images. It utilizes the ultralytics library and has been fine-tuned with a dataset comprising annotated UI form images. The model is suitable for applications that require automated detection of form fields such as names, numbers, emails, passwords, and buttons from images.

Architecture

The model is built on the YOLOv8 architecture, leveraging the ultralytics library for object detection tasks. It is specifically configured to handle UI form field detection with various parameters set for optimized performance.

Training

  • Training Data: Composed of 600 images of web UI forms from diverse sources, annotated with bounding box coordinates.
  • Fine-Tuning Process:
    • Pretrained backbone: Initialized with a pretrained YOLO object detection model.
    • Loss Function: Mean Average Precision (mAP) loss.
    • Optimizer: Adam optimizer with a learning rate of 1e-4.
    • Training Duration: 1 hour on an NVIDIA GeForce RTX 3090 GPU.

The model achieved an Average Precision (AP) of 0.51, precision of 0.80, recall of 0.70, and an F1 Score of 0.71 during evaluation.

Guide: Running Locally

  1. Installation:

    pip install ultralyticsplus==0.0.28 ultralytics==8.0.43
    
  2. Load Model and Perform Prediction:

    from ultralyticsplus import YOLO, render_result
    
    # load model
    model = YOLO('foduucom/web-form-ui-field-detection')
    
    # set model parameters
    model.overrides['conf'] = 0.25  # NMS confidence threshold
    model.overrides['iou'] = 0.45  # NMS IoU threshold
    model.overrides['agnostic_nms'] = False  # NMS class-agnostic
    model.overrides['max_det'] = 1000  # maximum number of detections per image
    
    # set image
    image = '/path/to/your/document/images'
    
    # perform inference
    results = model.predict(image)
    
    # observe results
    print(results[0].boxes)
    render = render_result(model=model, image=image, result=results[0])
    render.show()
    
  3. Hardware Recommendation: Given the model's computational requirements, using a cloud GPU like NVIDIA Tesla V100 is recommended for optimal performance.

License

For questions and contributions, reach out via email at info@foduu.com.

More Related APIs in Object Detection