deplot
googleIntroduction
DePlot is a model designed for visual language reasoning, specifically for translating images of plots and charts into linearized tables. This approach allows it to effectively use large language models (LLMs) for reasoning tasks, achieving significant improvement over previous state-of-the-art models with minimal training examples.
Architecture
DePlot operates in two primary steps: translating plots to text and reasoning over the translated text. It utilizes a modality conversion module to convert images into a format suitable for LLMs. The model leverages this conversion to perform reasoning tasks with fewer training examples than traditional models.
Training
DePlot is trained end-to-end on the standardized plot-to-table task. The training involves establishing unified task formats and metrics to ensure effective translation of visual data into text that can be processed by LLMs.
Guide: Running Locally
To run DePlot locally, follow these steps:
-
Install Necessary Libraries: Ensure you have the
transformers
,requests
, andPIL
libraries installed. -
Load the Model and Processor:
from transformers import Pix2StructProcessor, Pix2StructForConditionalGeneration processor = Pix2StructProcessor.from_pretrained('google/deplot') model = Pix2StructForConditionalGeneration.from_pretrained('google/deplot')
-
Prepare the Input Image:
import requests from PIL import Image url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/5090.png" image = Image.open(requests.get(url, stream=True).raw)
-
Generate Predictions:
inputs = processor(images=image, text="Generate underlying data table of the figure below:", return_tensors="pt") predictions = model.generate(**inputs, max_new_tokens=512) print(processor.decode(predictions[0], skip_special_tokens=True))
For optimal performance, using cloud GPUs such as AWS EC2 or Google Cloud's GPU instances is recommended.
License
DePlot is released under the Apache-2.0 license, which allows for broad use and distribution with appropriate attribution.