Pixtral Large Instruct 2411

mistralai

Introduction

Pixtral-Large-Instruct-2411 is a 124 billion parameter multimodal model built on the Mistral Large series. It excels in understanding images, including documents, charts, and natural images while maintaining high performance in text-only tasks. The model supports 10 languages and uses the vLLM library for inference.

Architecture

The model features a 123 billion parameter multimodal decoder and a 1 billion parameter vision encoder, with a 128K context window capable of handling at least 30 high-resolution images. This setup ensures advanced performance in various tasks, including MathVista, DocVQA, and VQAv2.

Training

Pixtral-Large-Instruct-2411 is an extension of Mistral Large 2, designed to offer improved multimodal capabilities without compromising text performance. The model uses a new instruction template to handle system prompts effectively and is evaluated across several benchmarks to ensure state-of-the-art performance.

Guide: Running Locally

Basic Steps

  1. Install vLLM Library
    Ensure vLLM version 0.6.4.post1 or later is installed:

    pip install --upgrade vllm
    
  2. Install Mistral Common Library
    Ensure mistral_common version 1.5.0 or later is installed:

    pip install --upgrade mistral_common
    
  3. Run the Model Server
    Start a server using the following command:

    vllm serve mistralai/Pixtral-Large-Instruct-2411 --config-format mistral --load-format mistral --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8
    
  4. Client Interaction
    Use the client code provided to interact with the server. This involves sending messages and receiving responses using an HTTP request setup.

Suggest Cloud GPUs

Running Pixtral-Large-Instruct-2411 requires over 300 GB of GPU RAM. Consider using cloud-based GPUs like those offered by AWS, Google Cloud, or Azure to facilitate efficient processing.

License

Pixtral-Large-Instruct-2411 is covered under the Mistral AI Research License (MRL). The license grants a non-exclusive, royalty-free right to use, modify, and distribute the model for non-commercial research purposes. For commercial use, a separate agreement with Mistral AI is required. More details on the license can be found here.

More Related APIs in Image Text To Text