Qwen2 V L 2 B Instruct G G U F
bartowskiIntroduction
The Qwen2-VL-2B-Instruct-GGUF is a multimodal model designed for image-to-text tasks, utilizing the llama.cpp framework for quantization. The model supports English and is distributed under the Apache 2.0 license.
Architecture
The Qwen2-VL-2B-Instruct-GGUF model is based on the original Qwen/Qwen2-VL-2B-Instruct model. It employs llama.cpp's imatrix quantization methods to optimize performance and storage, offering various quantization types for different quality and performance needs.
Training
The model was quantized using llama.cpp release b4327, with all quantizations performed using the imatrix option. The dataset used for this purpose was provided by the community and is available for further exploration.
Guide: Running Locally
To run the model locally, follow these steps:
- Build llama.cpp Locally: Ensure you have built llama.cpp on your local machine.
- Execution Command: Use the command below to run the model:
./llama-qwen2vl-cli -m /models/Qwen2-VL-2B-Instruct-Q4_0.gguf --mmproj /models/mmproj-Qwen2-VL-2B-Instruct-f32.gguf -p 'Describe this image.' --image '/models/test_image.jpg'
- Prompt Format:
<|im_start|>system {system_prompt}<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant
- Download Specific Files: Use the huggingface-cli to download specific quantized files:
pip install -U "huggingface_hub[cli]" huggingface-cli download bartowski/Qwen2-VL-2B-Instruct-GGUF --include "Qwen2-VL-2B-Instruct-Q4_K_M.gguf" --local-dir ./
For optimal performance, deploying the model on cloud GPUs like those from AWS, GCP, or Azure is recommended, especially for handling larger models efficiently.
License
The Qwen2-VL-2B-Instruct-GGUF model is released under the Apache 2.0 license, which allows for extensive use with minimal restrictions, promoting both academic and commercial usage.