Mini C P M V 2_6 gguf
openbmbIntroduction
MiniCPM-V 2.6 is an advanced model that can be used for various machine learning tasks. The model can be downloaded, prepared, and run using specific tools and scripts. It supports conversion to the GGUF format and includes options for quantization.
Architecture
MiniCPM-V 2.6 is based on the GGUF architecture, which facilitates efficient processing and inference. The model can be run in different precision modes, including f16 and quantized int4, allowing for flexibility based on computational resources and performance requirements.
Training
The training details are not explicitly provided in the documentation, but the model supports conversion and quantization, suggesting that it can be adapted for different tasks and optimized for performance. The conversion processes involve specific scripts that adjust model parameters and formats.
Guide: Running Locally
-
Prepare Models and Code:
- Download the MiniCPM-V-2_6 PyTorch model from Hugging Face.
- Clone the
llama.cpp
repository and navigate to theminicpmv-main
branch:git clone git@github.com:OpenBMB/llama.cpp.git cd llama.cpp git checkout minicpmv-main
-
Convert PyTorch Model to GGUF:
- Use provided scripts to convert the model:
python ./examples/llava/minicpmv-surgery.py -m ../MiniCPM-V-2_6 python ./examples/llava/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-V-2_6 --minicpmv-projector ../MiniCPM-V-2_6/minicpmv.projector --output-dir ../MiniCPM-V-2_6/ --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5 --minicpmv_version 3 python ./convert_hf_to_gguf.py ../MiniCPM-V-2_6/model
- Quantize to int4 version if needed:
./llama-quantize ../MiniCPM-V-2_6/model/ggml-model-f16.gguf ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf Q4_K_M
- Use provided scripts to convert the model:
-
Build and Run:
- For Linux or Mac, build the CLI tool:
make make llama-minicpmv-cli
- Run inference using the f16 or quantized int4 model versions:
./llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-f16.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
- For Linux or Mac, build the CLI tool:
-
Cloud GPUs:
- Consider using cloud GPU services such as AWS, Google Cloud, or Azure for running the model if local resources are insufficient. These platforms provide powerful GPUs that can accelerate model inference.
License
The license information is not provided in the summary. Users should verify the licensing terms on the official Hugging Face model page or in the repository to ensure compliance with usage guidelines.