trocr small printed
microsoftTROCR-SMALL-PRINTED
Introduction
The TrOCR (Transformer-based Optical Character Recognition) model is a small-sized model fine-tuned on the SROIE dataset for OCR tasks. It was introduced by Li et al. in the paper "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models."
Architecture
The TrOCR model utilizes an encoder-decoder architecture. The encoder is an image Transformer initialized from the DeiT weights, while the decoder is a text Transformer initialized from the UniLM weights. Images are processed as sequences of fixed-size patches (16x16 resolution), which are linearly embedded, and absolute position embeddings are added before input to the encoder. The decoder then generates text tokens autoregressively.
Training
The model was fine-tuned on the SROIE dataset, enabling it to perform OCR on images containing printed text. It supports single text-line images for OCR tasks.
Guide: Running Locally
To use the model locally in PyTorch, follow these steps:
- Install Dependencies: Ensure you have PyTorch and the Transformers library installed.
- Import Libraries:
from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image import requests
- Load Image: Use an image URL containing printed text and load it using PIL.
url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg' image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
- Preprocess and Load Model:
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-small-printed') model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-small-printed')
- Generate Text:
pixel_values = processor(images=image, return_tensors="pt").pixel_values generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
For enhanced performance, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure.
License
The model and its components are released under the MIT License, allowing for wide usage and modification.