Mini C P M Llama3 V 2_5
openbmbIntroduction
MiniCPM-Llama3-V 2.5 is a cutting-edge, multilingual, multimodal language model that operates at a GPT-4V level. It is designed for diverse applications, including OCR and multilingual processing, and can run efficiently on mobile devices. The model is built on the SigLip-400M and Llama3-8B-Instruct architectures with 8 billion parameters.
Architecture
MiniCPM-Llama3-V 2.5 combines the strengths of SigLip-400M and Llama3-8B-Instruct. It supports over 30 languages and offers advanced OCR capabilities, processing images with up to 1.8 million pixels. It includes optimizations for efficient deployment on edge devices, utilizing model quantization and NPU acceleration.
Training
The model is trained using extensive datasets, including the openbmb/RLAIF-V-Dataset, and employs the latest RLAIF-V method for improved trustworthiness and reduced hallucination rates. It supports LoRA fine-tuning with minimal GPU resources, enhancing its adaptability for various applications.
Guide: Running Locally
-
Requirements:
- Python 3.10
- Install the following packages: Pillow==10.1.0, torch==2.1.2, torchvision==0.16.2, transformers==4.40.0, sentencepiece==0.1.99
-
Setup:
- Use the
transformers
library for model inference on NVIDIA GPUs. - Example script provided for running inference with image input and text output.
- Use the
-
Deployment Options:
- Run with
llama.cpp
for CPU inference. - Use the INT4 quantized version for lower GPU memory usage.
- For cloud GPUs, consider services like AWS or Google Cloud with NVIDIA V100 or similar GPUs.
- Run with
License
The code is released under the Apache-2.0 License. The model weights are free for academic research and commercial use after registration. Users must adhere to the MiniCPM Model License terms. The developers are not liable for any misuse of the model.