Pixtral Large Instruct 2411
mistralaiIntroduction
Pixtral-Large-Instruct-2411 is a 124 billion parameter multimodal model built on the Mistral Large series. It excels in understanding images, including documents, charts, and natural images while maintaining high performance in text-only tasks. The model supports 10 languages and uses the vLLM library for inference.
Architecture
The model features a 123 billion parameter multimodal decoder and a 1 billion parameter vision encoder, with a 128K context window capable of handling at least 30 high-resolution images. This setup ensures advanced performance in various tasks, including MathVista, DocVQA, and VQAv2.
Training
Pixtral-Large-Instruct-2411 is an extension of Mistral Large 2, designed to offer improved multimodal capabilities without compromising text performance. The model uses a new instruction template to handle system prompts effectively and is evaluated across several benchmarks to ensure state-of-the-art performance.
Guide: Running Locally
Basic Steps
-
Install vLLM Library
Ensure vLLM version 0.6.4.post1 or later is installed:pip install --upgrade vllm
-
Install Mistral Common Library
Ensure mistral_common version 1.5.0 or later is installed:pip install --upgrade mistral_common
-
Run the Model Server
Start a server using the following command:vllm serve mistralai/Pixtral-Large-Instruct-2411 --config-format mistral --load-format mistral --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8
-
Client Interaction
Use the client code provided to interact with the server. This involves sending messages and receiving responses using an HTTP request setup.
Suggest Cloud GPUs
Running Pixtral-Large-Instruct-2411 requires over 300 GB of GPU RAM. Consider using cloud-based GPUs like those offered by AWS, Google Cloud, or Azure to facilitate efficient processing.
License
Pixtral-Large-Instruct-2411 is covered under the Mistral AI Research License (MRL). The license grants a non-exclusive, royalty-free right to use, modify, and distribute the model for non-commercial research purposes. For commercial use, a separate agreement with Mistral AI is required. More details on the license can be found here.