Tr O C R
qualcommIntroduction
TrOCR is a Transformer-based model optimized for mobile deployment, designed for state-of-the-art Optical Character Recognition (OCR) of both printed and handwritten text. It utilizes end-to-end text recognition with pre-trained image and text transformer models.
Architecture
- Model Type: Image to text
- Model Checkpoint: trocr-small-stage1
- Input Resolution: 320x320
- TrOCREncoder:
- Parameters: 23.0M
- Size: 87.8 MB
- TrOCRDecoder:
- Parameters: 38.3M
- Size: 146 MB
The model supports various deployment options, including TFLITE, QNN, and ONNX, tailored for different Snapdragon® devices.
Training
The TrOCR model employs a pre-trained image transformer for image understanding and a text transformer for wordpiece-level text generation. It supports devices like Samsung Galaxy S23, S24, and Snapdragon 8 Elite, optimizing inference times and memory usage.
Guide: Running Locally
Installation
Install the model via pip:
pip install "qai-hub-models[trocr]"
Running a Demo
Run the end-to-end demo using:
python -m qai_hub_models.models.trocr.demo
For Jupyter Notebook or Google Colab, use:
%run -m qai_hub_models.models.trocr.demo
Cloud GPUs
To run the model on a cloud-hosted Qualcomm® device, configure your client with an API token from Qualcomm® AI Hub:
qai-hub configure --api_token API_TOKEN
Run the model export script for performance checks and download compiled assets:
python -m qai_hub_models.models.trocr.export