matcha chart2text pew

google

Introduction

The MATCHA model is fine-tuned on the Chart2text-pew dataset, specifically designed for chart summarization tasks. MATCHA enhances visual language models by integrating math reasoning and chart derendering pretraining. It shows significant improvements in benchmarks like PlotQA and ChartQA.

Architecture

MATCHA builds upon the Pix2Struct, a visual language model that converts images to text. It introduces several pretraining tasks focused on plot deconstruction and numerical reasoning, key for visual language modeling. The model aims to bridge the gap in understanding visual language data, such as charts and infographics.

Training

The MATCHA pretraining starts with Pix2Struct, focusing on enhancing the model's capabilities in jointly modeling charts/plots and language data. The pretraining tasks are designed to improve plot deconstruction and numerical reasoning. This approach has demonstrated improvements in various domains, including screenshots, textbook diagrams, and document figures.

Guide: Running Locally

To run the MATCHA model locally, follow these steps:

  1. Install Dependencies: Ensure you have the Hugging Face Transformers library installed.
    pip install transformers
    
  2. Load the Model:
    from transformers import Pix2StructProcessor, Pix2StructForConditionalGeneration
    processor = Pix2StructProcessor.from_pretrained('google/matcha-chart2text-pew')
    model = Pix2StructForConditionalGeneration.from_pretrained('google/matcha-chart2text-pew')
    
  3. Prepare Input:
    import requests
    from PIL import Image
    url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/20294671002019.png"
    image = Image.open(requests.get(url, stream=True).raw)
    inputs = processor(images=image, return_tensors="pt")
    
  4. Generate Predictions:
    predictions = model.generate(**inputs, max_new_tokens=512)
    print(processor.decode(predictions[0], skip_special_tokens=True))
    

For optimal performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The MATCHA model is released under the Apache 2.0 license. This allows for both commercial and non-commercial use, distribution, and modification with appropriate attribution.

More Related APIs in Visual Question Answering