layoutlm document qa LLM Model

Introduction

The layoutlm-document-qa is a fine-tuned version of the LayoutLM model, designed for document question answering. It leverages the visual and textual modalities for extracting information from documents. The model has been fine-tuned using SQuAD 2.0 and DocVQA datasets, enabling it to perform well in understanding and answering questions from document images.

Architecture

This model is based on the LayoutLM architecture, which is a multi-modal transformer model. It is designed to integrate both text and layout information, making it suitable for tasks requiring understanding of document structures, such as invoices and contracts.

Training

The model was fine-tuned using the SQuAD 2.0 and DocVQA datasets. These datasets provide a wide range of document types and questions, allowing the model to generalize well across different document question-answering scenarios.

Guide: Running Locally

To run the layoutlm-document-qa model locally, follow these steps:

Install Dependencies: Ensure you have the following installed:

Set Up the Pipeline:

from transformers import pipeline

nlp = pipeline(
    "document-question-answering",
    model="impira/layoutlm-document-qa",
)

Run the Model: Pass a document image URL and a question to the pipeline.

nlp(
    "https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
    "What is the invoice number?"
)

Use Recent Transformers Version: The model requires a recent version of the Transformers library. Install it using:
```
pip install git+https://github.com/huggingface/transformers.git@2ef774211733f0acf8d3415f9284c49ef219e991
```

For optimal performance, especially with large datasets or batch processing, consider using cloud GPUs, such as those available on AWS, Google Cloud, or Azure.