trocr large printed
microsoftIntroduction
The TrOCR model is a large-sized model fine-tuned on the SROIE dataset, designed for Optical Character Recognition (OCR). It was introduced in the paper "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models" by Li et al. This model is intended for OCR tasks on single text-line images.
Architecture
TrOCR is an encoder-decoder model. The encoder is an image Transformer initialized from BEiT weights, while the decoder is a text Transformer initialized from RoBERTa weights. Images are processed as sequences of 16x16 patches, which are linearly embedded with additional absolute position embeddings before passing through the Transformer encoder. The text decoder generates tokens autoregressively.
Training
The TrOCR model has been fine-tuned on the SROIE dataset to enhance its OCR capabilities. The initial weights for the encoder and decoder components were sourced from pre-trained BEiT and RoBERTa models, respectively.
Guide: Running Locally
To run the model locally using PyTorch:
-
Install Dependencies: Ensure you have the
transformers
library installed, along withPIL
andrequests
for handling images and HTTP requests.pip install transformers pillow requests
-
Load and Process Image: Use the following code to load an image and process it:
from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image import requests url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg' image = Image.open(requests.get(url, stream=True).raw).convert("RGB") processor = TrOCRProcessor.from_pretrained('microsoft/trocr-large-printed') model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-large-printed') pixel_values = processor(images=image, return_tensors="pt").pixel_values generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
-
Output: The
generated_text
variable will contain the recognized text from the image.
For better performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.
License
The model and its components are released by Microsoft and made available through Hugging Face. Users should refer to the licensing terms provided within the Hugging Face platform and Microsoft's repository for specific usage rights and restrictions.