Q V Q 72 B Preview 4bit
mlx-communityIntroduction
The QVQ-72B-Preview-4bit model is a variant of the Qwen/QVQ-72B-Preview model, converted into the MLX format using mlx-vlm version 0.1.6. It is designed for image-text-to-text processing and supports conversational and chat functionalities. The model leverages the transformers
library and is optimized for use with the MLX framework.
Architecture
The QVQ-72B-Preview-4bit model is based on the Qwen/Qwen2-VL-72B architecture. It is structured to handle image-text-to-text tasks, facilitating advanced text generation with a focus on inference endpoints.
Training
Details regarding the specific training methodologies for the QVQ-72B-Preview-4bit model are not provided in the summary. Users are encouraged to refer to the original Qwen/QVQ-72B-Preview model card for in-depth training information.
Guide: Running Locally
-
Install MLX-VLM: Ensure you have the latest version of the mlx-vlm package installed.
pip install -U mlx-vlm
-
Generate Text: Use the following command to generate text using the model:
python -m mlx_vlm.generate --model mlx-community/QVQ-72B-Preview-4bit --max-tokens 100 --temp 0.0
-
Cloud GPUs: For optimal performance, especially with large models, consider utilizing cloud GPU services such as AWS EC2 with NVIDIA GPUs, Google Cloud's Compute Engine, or Azure's GPU offerings.
License
The QVQ-72B-Preview-4bit model is distributed under the Qwen license. For detailed licensing terms, refer to the license document.