Era X V L 7 B V1.5
erax-aiIntroduction
EraX-VL-7B-V1.5 is a robust multimodal model developed for Optical Character Recognition (OCR) and Visual Question Answering (VQA), with multilingual capabilities focusing on Vietnamese. It is designed to handle various document types, such as medical forms, invoices, and legal documents, making it useful for sectors like healthcare and insurance. The model is based on Qwen/Qwen2-VL-2B-Instruct and has over 7 billion parameters. It is part of the LànhGPT collection and was developed by a team at EraX, funded by Bamboo Capital Group.
Architecture
EraX-VL-7B-V1.5 is a Multimodal Transformer model, fine-tuned from Qwen/Qwen2-VL-7B-Instruct. It supports multiple languages, primarily Vietnamese, and has been enhanced for precise document recognition and multi-turn Q&A with robust reasoning capabilities.
Training
The model was fine-tuned on a diverse dataset, enhancing its capabilities in OCR and VQA. It is not yet trained on medical or car accident datasets, with updates expected by early 2025. The model's training aimed to improve its performance benchmarks, which are open-source and can be re-evaluated.
Guide: Running Locally
To run EraX-VL-7B-V1.5 locally:
-
Install the necessary packages:
python -m pip install git+https://github.com/huggingface/transformers accelerate python -m pip install qwen-vl-utils pip install flash-attn --no-build-isolation
-
Load the model in Python:
import torch from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor model_path = "erax/EraX-VL-7B-V1.5" model = Qwen2VLForConditionalGeneration.from_pretrained( model_path, torch_dtype=torch.bfloat16, attn_implementation="eager", # Use "flash_attention_2" for Ampere GPUs device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_path) processor = AutoProcessor.from_pretrained(model_path)
-
Prepare your images and text prompts, then run inference using the provided sample code.
Consider using cloud GPUs, such as those offered by AWS or Google Cloud, for better performance, especially when handling large datasets or complex tasks.
License
This model is released under the Apache 2.0 License, allowing for free use and distribution with proper attribution.