Qwen2 V L 7 B Instruct G G U F LLM Model

Introduction

The Qwen2-VL-7B-Instruct-GGUF is a multimodal image-text-to-text model, quantized using the llama.cpp framework. It operates in English and is available under the Apache 2.0 license.

Architecture

The model is derived from the original Qwen/Qwen2-VL-7B-Instruct and utilizes various quantization techniques to optimize performance across different hardware configurations. These include standard quantization for embeddings and outputs in Q8_0, enhancing compatibility with ARM and AVX CPUs.

Training

Quantization was performed using the llama.cpp framework, specifically the b4327 release. The model was trained with the imatrix option on a specialized dataset to achieve efficient performance.

Guide: Running Locally

Setup: Ensure you have built the llama.cpp locally.

Command: Execute the following command to run the model:

./llama-qwen2vl-cli -m /models/Qwen2-VL-7B-Instruct-Q4_0.gguf --mmproj /models/mmproj-Qwen2-VL-7B-Instruct-f32.gguf -p 'Describe this image.' --image '/models/test_image.jpg'

Download Files: Use the huggingface-cli to download specific model files. For example:

pip install -U "huggingface_hub[cli]"
huggingface-cli download bartowski/Qwen2-VL-7B-Instruct-GGUF --include "Qwen2-VL-7B-Instruct-Q4_K_M.gguf" --local-dir ./

Hardware Suggestions: To optimize performance, consider using cloud GPUs that support cuBLAS (Nvidia) or rocBLAS (AMD).

License

The model and associated files are released under the Apache 2.0 license, permitting use, modification, and distribution under the specified terms.

More Related APIs in Image Text To Text