Qwen2 V L Math Prase 2 B Instruct
prithivMLmodsIntroduction
The Qwen2-VL-Math-Prase-2B-Instruct model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct, optimized for Optical Character Recognition (OCR), image-to-text conversion, and solving math problems using LaTeX formatting. This model excels in multi-modal tasks, integrating conversational approaches with visual and textual understanding.
Architecture
- Vision-Language Integration: Combines image understanding with natural language processing to convert images into text.
- Optical Character Recognition (OCR): Extracts and processes textual information from images with high accuracy.
- Math and LaTeX Support: Solves mathematical problems and outputs equations in LaTeX format.
- Conversational Capabilities: Handles multi-turn interactions for context-aware responses.
- Image-Text-to-Text Generation: Generates descriptive or problem-solving text from images and text inputs.
- Secure Weight Format: Utilizes Safetensors for secure and efficient model weight loading.
Training
- Base Model: Qwen/Qwen2-VL-2B-Instruct
- Model Size: 2.21 billion parameters, optimized for BF16 tensor type for efficient inference.
- Specializations: Tailored for OCR tasks in images and mathematical reasoning with LaTeX output.
Guide: Running Locally
-
Install Required Packages:
pip install transformers
-
Load the Model:
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor model = Qwen2VLForConditionalGeneration.from_pretrained( "prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct", torch_dtype="auto", device_map="auto" ) processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct")
-
Prepare Input:
messages = [{"role": "user", "content": [{"type": "image", "image": "image_url"}, {"type": "text", "text": "Describe this image."}]}]
-
Inference:
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt").to("cuda") generated_ids = model.generate(**inputs, max_new_tokens=128) output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False) print(output_text)
-
Cloud GPUs: For optimal performance, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure.
License
The model is released under the Apache-2.0 license.