Tr O C R LLM Model — Open LLM List

Introduction

TrOCR is a Transformer-based model optimized for mobile deployment, designed for state-of-the-art Optical Character Recognition (OCR) of both printed and handwritten text. It utilizes end-to-end text recognition with pre-trained image and text transformer models.

Architecture

Model Type: Image to text
Model Checkpoint: trocr-small-stage1
Input Resolution: 320x320
TrOCREncoder:
- Parameters: 23.0M
- Size: 87.8 MB
TrOCRDecoder:
- Parameters: 38.3M
- Size: 146 MB

The model supports various deployment options, including TFLITE, QNN, and ONNX, tailored for different Snapdragon® devices.

Training

The TrOCR model employs a pre-trained image transformer for image understanding and a text transformer for wordpiece-level text generation. It supports devices like Samsung Galaxy S23, S24, and Snapdragon 8 Elite, optimizing inference times and memory usage.

Guide: Running Locally

Installation

Install the model via pip:

pip install "qai-hub-models[trocr]"

Running a Demo

Run the end-to-end demo using:

python -m qai_hub_models.models.trocr.demo

For Jupyter Notebook or Google Colab, use:

%run -m qai_hub_models.models.trocr.demo

Cloud GPUs

To run the model on a cloud-hosted Qualcomm® device, configure your client with an API token from Qualcomm® AI Hub:

qai-hub configure --api_token API_TOKEN

Run the model export script for performance checks and download compiled assets:

python -m qai_hub_models.models.trocr.export

License

The original implementation of TrOCR is under the MIT License, available here.
Compiled assets for on-device deployment are covered by the Qualcomm AI Hub Proprietary License, available here.

More Related APIs in Image To Text