Q V Q 72 B Preview A W Q
kosbuIntroduction
QVQ-72B-Preview is an experimental research model developed by the Qwen team to enhance visual reasoning capabilities. The model is a 4-bit quantized version, specifically designed to support efficient scaling across multiple GPUs. This involves zero-padding of weights to resolve divisibility constraints, which minimally impacts computation.
Architecture
The QVQ-72B-Preview employs a 4-bit quantization strategy to achieve compatibility with multi-GPU tensor parallelism. The architecture focuses on multidisciplinary understanding, particularly enhancing performance in visual reasoning tasks.
Training
The model has shown excellent results on benchmarks such as the Multimodal Massive Multi-task Understanding (MMMU) with a score of 70.3%, demonstrating strong multidisciplinary understanding. It also shows improvements in mathematical reasoning tasks on the MathVision benchmark. However, limitations include potential language mixing, recursive reasoning loops, and the need for robust safety measures.
Guide: Running Locally
- Install Dependencies: Ensure you have Python and
transformers
library installed.pip install transformers
- Download Model: Access the QVQ-72B-Preview model from Hugging Face's model hub.
- Run Inference: Use the model to perform text generation tasks.
from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/QVQ-72B-Preview") model = AutoModelForCausalLM.from_pretrained("Qwen/QVQ-72B-Preview") prompt = "Your input text here" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs)
Cloud GPUs
For improved performance, consider using cloud GPU services like AWS, Google Cloud, or Azure to handle the computational demands of running the model.
License
The model is released under the Qwen license. For more details, visit the license page.