Qwen2 V L 7 B Instruct G G U F
bartowskiIntroduction
The Qwen2-VL-7B-Instruct-GGUF is a multimodal image-text-to-text model, quantized using the llama.cpp framework. It operates in English and is available under the Apache 2.0 license.
Architecture
The model is derived from the original Qwen/Qwen2-VL-7B-Instruct and utilizes various quantization techniques to optimize performance across different hardware configurations. These include standard quantization for embeddings and outputs in Q8_0, enhancing compatibility with ARM and AVX CPUs.
Training
Quantization was performed using the llama.cpp framework, specifically the b4327 release. The model was trained with the imatrix option on a specialized dataset to achieve efficient performance.
Guide: Running Locally
-
Setup: Ensure you have built the llama.cpp locally.
-
Command: Execute the following command to run the model:
./llama-qwen2vl-cli -m /models/Qwen2-VL-7B-Instruct-Q4_0.gguf --mmproj /models/mmproj-Qwen2-VL-7B-Instruct-f32.gguf -p 'Describe this image.' --image '/models/test_image.jpg'
-
Download Files: Use the
huggingface-cli
to download specific model files. For example:pip install -U "huggingface_hub[cli]" huggingface-cli download bartowski/Qwen2-VL-7B-Instruct-GGUF --include "Qwen2-VL-7B-Instruct-Q4_K_M.gguf" --local-dir ./
-
Hardware Suggestions: To optimize performance, consider using cloud GPUs that support cuBLAS (Nvidia) or rocBLAS (AMD).
License
The model and associated files are released under the Apache 2.0 license, permitting use, modification, and distribution under the specified terms.