layoutlm base uncased

microsoft

Introduction

LayoutLM is a pre-training method developed by Microsoft for enhancing document image understanding and information extraction tasks by integrating text, layout, and image data. This model achieves state-of-the-art results on various datasets and is particularly effective in applications like form and receipt understanding.

Architecture

LayoutLM is available in two configurations:

  • LayoutLM-Base, Uncased: Consists of 12 layers, 768 hidden units, and 12 heads, totaling 113 million parameters.
  • LayoutLM-Large, Uncased: Comprises 24 layers, 1024 hidden units, and 16 heads, with 343 million parameters.

Training

The model is pre-trained on the IIT-CDIP Test Collection 1.0 dataset, consisting of 11 million documents over two epochs. This extensive dataset helps the model learn the intricate relationships between text and layout in document images.

Guide: Running Locally

  1. Install Dependencies: Ensure that you have Python and PyTorch installed. Use the Hugging Face Transformers library for implementation.
  2. Clone Repository: Use Git to clone the LayoutLM repository from GitHub.
  3. Load Pre-trained Model: Utilize the Transformers library to load the LayoutLM model.
  4. Prepare Data: Format your documents to match the model's expected input structure.
  5. Run Inference: Execute the model on your data to extract and understand document information.

For enhanced performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

The LayoutLM model is licensed under the MIT License, allowing for extensive freedom in usage, modification, and distribution.

More Related APIs