trocr small printed

microsoft

TROCR-SMALL-PRINTED

Introduction

The TrOCR (Transformer-based Optical Character Recognition) model is a small-sized model fine-tuned on the SROIE dataset for OCR tasks. It was introduced by Li et al. in the paper "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models."

Architecture

The TrOCR model utilizes an encoder-decoder architecture. The encoder is an image Transformer initialized from the DeiT weights, while the decoder is a text Transformer initialized from the UniLM weights. Images are processed as sequences of fixed-size patches (16x16 resolution), which are linearly embedded, and absolute position embeddings are added before input to the encoder. The decoder then generates text tokens autoregressively.

Training

The model was fine-tuned on the SROIE dataset, enabling it to perform OCR on images containing printed text. It supports single text-line images for OCR tasks.

Guide: Running Locally

To use the model locally in PyTorch, follow these steps:

  1. Install Dependencies: Ensure you have PyTorch and the Transformers library installed.
  2. Import Libraries:
    from transformers import TrOCRProcessor, VisionEncoderDecoderModel
    from PIL import Image
    import requests
    
  3. Load Image: Use an image URL containing printed text and load it using PIL.
    url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg'
    image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
    
  4. Preprocess and Load Model:
    processor = TrOCRProcessor.from_pretrained('microsoft/trocr-small-printed')
    model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-small-printed')
    
  5. Generate Text:
    pixel_values = processor(images=image, return_tensors="pt").pixel_values
    generated_ids = model.generate(pixel_values)
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
    

For enhanced performance, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure.

License

The model and its components are released under the MIT License, allowing for wide usage and modification.

More Related APIs in Image To Text