Q V Q 72 B Preview 3bit

mlx-community

Introduction

The QVQ-72B-Preview-3bit model, hosted by the MLX Community, is a variant converted to the MLX format from the original Qwen/QVQ-72B-Preview model. It is designed to handle image-to-text conversion tasks using advanced transformer techniques.

Architecture

The model is based on the Qwen2-VL-72B architecture, utilizing the MLX library and the Transformers library for efficient text generation. It supports image-text-to-text pipelines and is optimized for conversational applications.

Training

The model was converted using mlx-vlm version 0.1.6. While specific training details are not provided in this excerpt, the model's architecture is optimized for high-performance inference tasks.

Guide: Running Locally

To run the QVQ-72B-Preview-3bit model locally, follow these steps:

  1. Install MLX-VLM:

    pip install -U mlx-vlm
    
  2. Generate Text:
    Run the model using the following command:

    python -m mlx_vlm.generate --model mlx-community/QVQ-72B-Preview-3bit --max-tokens 100 --temp 0.0
    

Cloud GPU Recommendation

For optimal performance, especially when processing large datasets, consider using cloud-based GPUs. Platforms such as AWS, Google Cloud Platform, or Azure provide scalable GPU resources suitable for intensive machine learning tasks.

License

The model is distributed under the "qwen" license. For full licensing details, refer to the license link.

More Related APIs in Image Text To Text