magi
ragavsachdevaMAGI: The Manga Whisperer
Introduction
MAGI, also referred to as "The Manga Whisperer," is a model designed by Ragav Sachdeva and Andrew Zisserman from the University of Oxford. It aims to automatically generate transcriptions for comics, utilizing advanced techniques in object detection and optical character recognition (OCR).
Architecture
MAGI employs the Transformers library and is built using PyTorch. It supports feature extraction and is optimized for handling manga, object detection, OCR, clustering, and diarization tasks.
Training
The MAGI model is trained to process images, detect relevant objects, and associate detected text with OCR results. This allows for generating accurate transcriptions of comic panels.
Guide: Running Locally
To run MAGI locally, follow these steps:
- Install Requirements: Ensure you have Python and the necessary libraries, such as Transformers, NumPy, PIL, and PyTorch.
- Load Images: Prepare your manga images as JPEG or PNG files.
- Model Setup:
from transformers import AutoModel import torch model = AutoModel.from_pretrained("ragavsachdeva/magi", trust_remote_code=True).cuda()
- Process Images: Use the model to predict detections and OCR results.
- Visualization and Transcription: Generate visual outputs and text transcriptions for each image.
For improved performance, it's recommended to use cloud GPUs provided by platforms like AWS, Google Cloud, or Azure.
License
The MAGI model is available for personal, research, non-commercial, and not-for-profit use. For commercial usage, please contact Ragav Sachdeva for a tailored licensing agreement. More details can be found on his website.