manga ocr base

kha-white

Introduction

Manga OCR is an optical character recognition (OCR) tool designed specifically for Japanese text, with a primary focus on manga. This tool utilizes the Vision Encoder Decoder framework to deliver high-quality text recognition, capable of handling various challenges specific to manga.

Architecture

The model is built using the Vision Encoder Decoder framework, which is a part of the Transformers library. This architecture allows the model to efficiently process and translate images containing Japanese text into readable text format.

Training

Manga OCR was trained to address the unique challenges found in manga, such as:

  • Vertical and horizontal text orientation.
  • Text with furigana (small Japanese characters written next to kanji).
  • Text overlaid on images.
  • A wide variety of fonts and styles.
  • Handling low-quality images.

The model leverages the manga109s dataset to enhance its accuracy and robustness.

Guide: Running Locally

To run Manga OCR locally, you may follow these basic steps:

  1. Clone the repository from GitHub:

    git clone https://github.com/kha-white/manga_ocr.git
    
  2. Set up the environment and install dependencies:

    cd manga_ocr
    pip install -r requirements.txt
    
  3. Run the OCR process on your images using the provided scripts.

For improved performance, especially when working with large datasets, consider using cloud GPU services such as AWS, Google Cloud, or Azure.

License

Manga OCR is licensed under the Apache-2.0 License, allowing for wide usage and distribution under the terms specified in the license.

More Related APIs in Image To Text