G O T O C R2_0

stepfun-ai

Introduction

GOT-OCR2.0 is an advanced Optical Character Recognition (OCR) model developed to enhance text recognition from images using a unified end-to-end approach. This model leverages the capabilities of the Transformers library and is designed to process multilingual inputs across various OCR tasks, including plain text extraction, formatted text recognition, and fine-grained OCR.

Architecture

The model utilizes the Transformers framework with the ability to work with safetensors for efficient and secure model deployment. It supports multiple OCR types such as plain texts, formatted texts, and fine-grained recognition, and is designed to operate on NVIDIA GPUs for accelerated performance.

Training

The training details for the GOT-OCR2.0 model can be accessed on the project's GitHub repository. The model was tested using Python 3.10 and several dependencies including Torch (version 2.0.1), Torchvision (version 0.15.2), Transformers (version 4.37.2), and other supportive libraries. The authors have published research papers detailing the model's development and results on arXiv.

Guide: Running Locally

  1. Install Dependencies: Ensure Python 3.10 is installed. Install the necessary packages using pip:

    pip install torch==2.0.1 torchvision==0.15.2 transformers==4.37.2 tiktoken==0.6.0 verovio==4.3.1 accelerate==0.28.0
    
  2. Load the Model: Use the following Python code to load the model and tokenizer:

    from transformers import AutoModel, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True)
    model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id)
    model = model.eval().cuda()
    
  3. Run Inference: Replace 'xxx.jpg' with your test image and execute the OCR task:

    image_file = 'xxx.jpg'
    res = model.chat(tokenizer, image_file, ocr_type='ocr')
    print(res)
    
  4. Consider Cloud GPUs: For optimal performance, especially with large datasets or high-resolution images, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.

License

GOT-OCR2.0 is licensed under the Apache License 2.0, allowing users to freely use, modify, and distribute the software with minimal restrictions.

More Related APIs in Image Text To Text