layoutlm document qa
impiraIntroduction
The layoutlm-document-qa
is a fine-tuned version of the LayoutLM model, designed for document question answering. It leverages the visual and textual modalities for extracting information from documents. The model has been fine-tuned using SQuAD 2.0 and DocVQA datasets, enabling it to perform well in understanding and answering questions from document images.
Architecture
This model is based on the LayoutLM architecture, which is a multi-modal transformer model. It is designed to integrate both text and layout information, making it suitable for tasks requiring understanding of document structures, such as invoices and contracts.
Training
The model was fine-tuned using the SQuAD 2.0 and DocVQA datasets. These datasets provide a wide range of document types and questions, allowing the model to generalize well across different document question-answering scenarios.
Guide: Running Locally
To run the layoutlm-document-qa
model locally, follow these steps:
-
Install Dependencies: Ensure you have the following installed:
-
Set Up the Pipeline:
from transformers import pipeline nlp = pipeline( "document-question-answering", model="impira/layoutlm-document-qa", )
-
Run the Model: Pass a document image URL and a question to the pipeline.
nlp( "https://templates.invoicehome.com/invoice-template-us-neat-750px.png", "What is the invoice number?" )
-
Use Recent Transformers Version: The model requires a recent version of the Transformers library. Install it using:
pip install git+https://github.com/huggingface/transformers.git@2ef774211733f0acf8d3415f9284c49ef219e991
For optimal performance, especially with large datasets or batch processing, consider using cloud GPUs, such as those available on AWS, Google Cloud, or Azure.
License
This model is provided under the MIT License, allowing for flexible use and modification.