Mini C P M V 2_6

openbmb

Introduction

MiniCPM-V 2.6 is an advanced model in the MiniCPM-V series, boasting 8 billion parameters. It is designed for image, multi-image, and video understanding, achieving top-tier performance across various benchmarks. The model is built on SigLip-400M and Qwen2-7B architectures, and outperforms many proprietary models in its class.

Architecture

MiniCPM-V 2.6 includes several enhancements:

  • Leading Performance: Achieves high scores on OpenCompass evaluation.
  • Multi Image Understanding: Excels in reasoning across multiple images.
  • Video Understanding: Capable of processing video inputs with superior performance.
  • OCR and Multilingual Capabilities: Handles images of up to 1.8 million pixels and supports multiple languages.
  • Efficiency: Optimized for high token density and resource efficiency, suitable for real-time applications on devices like iPads.

Training

The model was trained using the RLAIF-V dataset and leverages VisCPM techniques for improved reliability and reduced hallucination rates. It supports multilingual contexts, enhancing its robustness across different languages.

Guide: Running Locally

Basic Steps:

  1. Install Required Packages:
    pip install Pillow torch torchvision transformers sentencepiece decord
    
  2. Download the Model:
    from transformers import AutoModel, AutoTokenizer
    model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
    tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
    
  3. Run Inference:
    • For images:
      from PIL import Image
      image = Image.open('path_to_image.jpg').convert('RGB')
      # Define question and use model.chat for inference
      
    • For videos:
      from decord import VideoReader
      # Load and process video frames for model.chat
      

Suggested Cloud GPUs:

Consider using cloud services like AWS, Google Cloud, or Azure for GPU access if local resources are insufficient.

License

  • Code License: Apache-2.0 License.
  • Model License: Usage of model weights follows the MiniCPM Model License. Free for academic research; commercial use requires registration.
  • Disclaimer: The model's output does not represent the views of the developers, and they are not liable for its misuse.

More Related APIs in Image Text To Text