Introduction

TrOCR is a Transformer-based model optimized for mobile deployment, designed for state-of-the-art Optical Character Recognition (OCR) of both printed and handwritten text. It utilizes end-to-end text recognition with pre-trained image and text transformer models.

Architecture

  • Model Type: Image to text
  • Model Checkpoint: trocr-small-stage1
  • Input Resolution: 320x320
  • TrOCREncoder:
    • Parameters: 23.0M
    • Size: 87.8 MB
  • TrOCRDecoder:
    • Parameters: 38.3M
    • Size: 146 MB

The model supports various deployment options, including TFLITE, QNN, and ONNX, tailored for different Snapdragon® devices.

Training

The TrOCR model employs a pre-trained image transformer for image understanding and a text transformer for wordpiece-level text generation. It supports devices like Samsung Galaxy S23, S24, and Snapdragon 8 Elite, optimizing inference times and memory usage.

Guide: Running Locally

Installation

Install the model via pip:

pip install "qai-hub-models[trocr]"

Running a Demo

Run the end-to-end demo using:

python -m qai_hub_models.models.trocr.demo

For Jupyter Notebook or Google Colab, use:

%run -m qai_hub_models.models.trocr.demo

Cloud GPUs

To run the model on a cloud-hosted Qualcomm® device, configure your client with an API token from Qualcomm® AI Hub:

qai-hub configure --api_token API_TOKEN

Run the model export script for performance checks and download compiled assets:

python -m qai_hub_models.models.trocr.export

License

  • The original implementation of TrOCR is under the MIT License, available here.
  • Compiled assets for on-device deployment are covered by the Qualcomm AI Hub Proprietary License, available here.

More Related APIs in Image To Text