Mini C P M V 2_6 LLM Model

Introduction

MiniCPM-V 2.6 is an advanced model in the MiniCPM-V series, boasting 8 billion parameters. It is designed for image, multi-image, and video understanding, achieving top-tier performance across various benchmarks. The model is built on SigLip-400M and Qwen2-7B architectures, and outperforms many proprietary models in its class.

Architecture

MiniCPM-V 2.6 includes several enhancements:

Leading Performance: Achieves high scores on OpenCompass evaluation.
Multi Image Understanding: Excels in reasoning across multiple images.
Video Understanding: Capable of processing video inputs with superior performance.
OCR and Multilingual Capabilities: Handles images of up to 1.8 million pixels and supports multiple languages.
Efficiency: Optimized for high token density and resource efficiency, suitable for real-time applications on devices like iPads.

Training

The model was trained using the RLAIF-V dataset and leverages VisCPM techniques for improved reliability and reduced hallucination rates. It supports multilingual contexts, enhancing its robustness across different languages.

Guide: Running Locally

Basic Steps:

Install Required Packages:

pip install Pillow torch torchvision transformers sentencepiece decord

Download the Model:

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)

Run Inference:

For images:

from PIL import Image
image = Image.open('path_to_image.jpg').convert('RGB')
# Define question and use model.chat for inference

For videos:

from decord import VideoReader
# Load and process video frames for model.chat

Suggested Cloud GPUs:

Consider using cloud services like AWS, Google Cloud, or Azure for GPU access if local resources are insufficient.

License

Code License: Apache-2.0 License.
Model License: Usage of model weights follows the MiniCPM Model License. Free for academic research; commercial use requires registration.
Disclaimer: The model's output does not represent the views of the developers, and they are not liable for its misuse.

More Related APIs in Image Text To Text