Intern V L2_5 38 B M P O
OpenGVLabIntroduction
InternVL2.5-MPO is an advanced multimodal large language model (MLLM) series that builds upon InternVL2.5 and Mixed Preference Optimization (MPO), offering superior performance across various tasks.
Architecture
InternVL2.5-MPO retains the architecture of previous versions, following the "ViT-MLP-LLM" paradigm. It integrates a newly pre-trained InternViT with various pre-trained large language models (LLMs) like InternLM 2.5 and Qwen 2.5 using a randomly initialized MLP projector. The model supports multi-image and video data through a dynamic resolution strategy and pixel unshuffle operation to reduce visual tokens.
Training
The training process involves a large-scale Multi-Modal Preference Dataset (MMPR) with about 3 million samples. The Mixed Preference Optimization (MPO) method combines preference loss, quality loss, and generation loss to enable the model to learn the relative preference and absolute quality of responses, as well as the process for generating preferred responses.
Guide: Running Locally
To run the InternVL2_5-38B-MPO model locally:
-
Environment Setup:
- Ensure
transformers>=4.37.2
is installed. - Use at least two 80GB GPUs if not using 8-bit quantization.
- Ensure
-
Model Loading:
- Load the model using PyTorch with a specific torch data type and memory settings.
import torch from transformers import AutoTokenizer, AutoModel path = "OpenGVLab/InternVL2_5-38B-MPO" model = AutoModel.from_pretrained( path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, use_flash_attn=True, trust_remote_code=True).eval().cuda()
-
Multi-GPU Setup:
- Define a device map to distribute model layers across available GPUs.
-
Inference:
- Use the provided scripts to run inference on images or videos.
-
Cloud GPUs:
- For enhanced performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure, which offer powerful GPUs suitable for running large models.
License
The InternVL2.5-MPO project is released under the MIT License. It includes the Qwen2.5-32B-Instruct component licensed under the Apache License 2.0.