Monstral 123 B v2 G G U F
bartowskiIntroduction
Monstral-123B-v2-GGUF is a quantized version of the Monstral-123B-v2 model, optimized for text generation. Created by Bartowski using the llama.cpp
framework, it offers various quantization options for diverse computational needs.
Architecture
The model is based on MarsupialAI's Monstral-123B-v2 and utilizes llama.cpp
for quantization. The model supports multiple quantization types (Q8_0, Q6_K, etc.) to balance performance and resource requirements, making it versatile for different hardware configurations.
Training
Quantizations were performed using the imatrix
option and a dataset available here. The quantization aims to optimize the model for performance on ARM CPUs and certain AVX2/AVX512 CPUs.
Guide: Running Locally
-
Installation: Ensure
huggingface_hub
is installed by running:pip install -U "huggingface_hub[cli]"
-
Download Model: Use the
huggingface-cli
to download the desired quantized file:huggingface-cli download bartowski/Monstral-123B-v2-GGUF --include "Monstral-123B-v2-Q4_K_M.gguf" --local-dir ./
-
Run Inference: Use LM Studio or appropriate framework for inference. For optimal performance, use cloud GPUs from providers like AWS, GCP, or Azure, especially if your local hardware is limited.
-
Considerations: Choose quantization according to your hardware capabilities and performance needs. Use
K-quants
for general use orI-quants
for specific GPU acceleration scenarios.
License
The model is licensed under the MRL (Machine Readable License), which dictates the terms of use, distribution, and modification.