layoutlm document qa

impira

Introduction

The layoutlm-document-qa is a fine-tuned version of the LayoutLM model, designed for document question answering. It leverages the visual and textual modalities for extracting information from documents. The model has been fine-tuned using SQuAD 2.0 and DocVQA datasets, enabling it to perform well in understanding and answering questions from document images.

Architecture

This model is based on the LayoutLM architecture, which is a multi-modal transformer model. It is designed to integrate both text and layout information, making it suitable for tasks requiring understanding of document structures, such as invoices and contracts.

Training

The model was fine-tuned using the SQuAD 2.0 and DocVQA datasets. These datasets provide a wide range of document types and questions, allowing the model to generalize well across different document question-answering scenarios.

Guide: Running Locally

To run the layoutlm-document-qa model locally, follow these steps:

  1. Install Dependencies: Ensure you have the following installed:

  2. Set Up the Pipeline:

    from transformers import pipeline
    
    nlp = pipeline(
        "document-question-answering",
        model="impira/layoutlm-document-qa",
    )
    
  3. Run the Model: Pass a document image URL and a question to the pipeline.

    nlp(
        "https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
        "What is the invoice number?"
    )
    
  4. Use Recent Transformers Version: The model requires a recent version of the Transformers library. Install it using:

    pip install git+https://github.com/huggingface/transformers.git@2ef774211733f0acf8d3415f9284c49ef219e991
    

For optimal performance, especially with large datasets or batch processing, consider using cloud GPUs, such as those available on AWS, Google Cloud, or Azure.

License

This model is provided under the MIT License, allowing for flexible use and modification.

More Related APIs in Document Question Answering