Mini C P M V 2_6
openbmbIntroduction
MiniCPM-V 2.6 is an advanced model in the MiniCPM-V series, boasting 8 billion parameters. It is designed for image, multi-image, and video understanding, achieving top-tier performance across various benchmarks. The model is built on SigLip-400M and Qwen2-7B architectures, and outperforms many proprietary models in its class.
Architecture
MiniCPM-V 2.6 includes several enhancements:
- Leading Performance: Achieves high scores on OpenCompass evaluation.
- Multi Image Understanding: Excels in reasoning across multiple images.
- Video Understanding: Capable of processing video inputs with superior performance.
- OCR and Multilingual Capabilities: Handles images of up to 1.8 million pixels and supports multiple languages.
- Efficiency: Optimized for high token density and resource efficiency, suitable for real-time applications on devices like iPads.
Training
The model was trained using the RLAIF-V dataset and leverages VisCPM techniques for improved reliability and reduced hallucination rates. It supports multilingual contexts, enhancing its robustness across different languages.
Guide: Running Locally
Basic Steps:
- Install Required Packages:
pip install Pillow torch torchvision transformers sentencepiece decord
- Download the Model:
from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
- Run Inference:
- For images:
from PIL import Image image = Image.open('path_to_image.jpg').convert('RGB') # Define question and use model.chat for inference
- For videos:
from decord import VideoReader # Load and process video frames for model.chat
- For images:
Suggested Cloud GPUs:
Consider using cloud services like AWS, Google Cloud, or Azure for GPU access if local resources are insufficient.
License
- Code License: Apache-2.0 License.
- Model License: Usage of model weights follows the MiniCPM Model License. Free for academic research; commercial use requires registration.
- Disclaimer: The model's output does not represent the views of the developers, and they are not liable for its misuse.