Qwen2 V L Math Prase 2 B Instruct LLM Model

Introduction

The Qwen2-VL-Math-Prase-2B-Instruct model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct, optimized for Optical Character Recognition (OCR), image-to-text conversion, and solving math problems using LaTeX formatting. This model excels in multi-modal tasks, integrating conversational approaches with visual and textual understanding.

Architecture

Vision-Language Integration: Combines image understanding with natural language processing to convert images into text.
Optical Character Recognition (OCR): Extracts and processes textual information from images with high accuracy.
Math and LaTeX Support: Solves mathematical problems and outputs equations in LaTeX format.
Conversational Capabilities: Handles multi-turn interactions for context-aware responses.
Image-Text-to-Text Generation: Generates descriptive or problem-solving text from images and text inputs.
Secure Weight Format: Utilizes Safetensors for secure and efficient model weight loading.

Training

Base Model: Qwen/Qwen2-VL-2B-Instruct
Model Size: 2.21 billion parameters, optimized for BF16 tensor type for efficient inference.
Specializations: Tailored for OCR tasks in images and mathematical reasoning with LaTeX output.

Guide: Running Locally

Install Required Packages:
```
pip install transformers
```

Load the Model:

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct")

Prepare Input:

messages = [{"role": "user", "content": [{"type": "image", "image": "image_url"}, {"type": "text", "text": "Describe this image."}]}]

Inference:

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt").to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=128)
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)

Cloud GPUs: For optimal performance, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure.

License

The model is released under the Apache-2.0 license.

More Related APIs in Image Text To Text