invoice and receipts_donut_v1

mychen76

Introduction

invoice-and-receipts_donut_v1 is a fine-tuned model designed to convert invoice or receipt images into structured JSON or XML data. This model is based on the Donut architecture, aiming to eliminate the need for traditional OCR engines, thereby simplifying the conversion process and optimizing resource usage.

Architecture

The model is built upon the Donut architecture, a vision-encoder-decoder framework. It leverages the capabilities of image-text-to-text transformation powered by the Transformers library and is implemented in PyTorch. This design facilitates the direct conversion of image data into structured formats without intermediate steps, enhancing performance and reducing dependencies.

Training

The model has been fine-tuned specifically for the task of transforming invoice or receipt images into JSON or XML formats. Training involved adapting the Donut model to accurately extract and structure data from diverse invoice formats, focusing on efficiency and accuracy.

Guide: Running Locally

To run the invoice-and-receipts_donut_v1 model locally, follow these steps:

  1. Install Required Libraries:

    • Ensure you have Python and PyTorch installed.
    • Install the Transformers library:
      pip install transformers
      
    • If utilizing Safetensors, install it as well:
      pip install safetensors
      
  2. Download the Model:

    • Obtain the model files from the Hugging Face Model Hub.
  3. Run Inference:

    • Load the model using the Transformers library.
    • Pass invoice or receipt images to the model to get JSON or XML outputs.
  4. Hardware Recommendations:

    • For optimal performance, consider using cloud GPUs such as those offered by AWS, GCP, or Azure.

License

The invoice-and-receipts_donut_v1 model is licensed under the Apache 2.0 License, allowing for both commercial and non-commercial use with minimal restrictions.

More Related APIs in Image Text To Text