Monstral 123 B v2 G G U F

bartowski

Introduction
Monstral-123B-v2-GGUF is a quantized version of the Monstral-123B-v2 model, optimized for text generation. Created by Bartowski using the llama.cpp framework, it offers various quantization options for diverse computational needs.

Architecture
The model is based on MarsupialAI's Monstral-123B-v2 and utilizes llama.cpp for quantization. The model supports multiple quantization types (Q8_0, Q6_K, etc.) to balance performance and resource requirements, making it versatile for different hardware configurations.

Training
Quantizations were performed using the imatrix option and a dataset available here. The quantization aims to optimize the model for performance on ARM CPUs and certain AVX2/AVX512 CPUs.

Guide: Running Locally

  1. Installation: Ensure huggingface_hub is installed by running:

    pip install -U "huggingface_hub[cli]"
    
  2. Download Model: Use the huggingface-cli to download the desired quantized file:

    huggingface-cli download bartowski/Monstral-123B-v2-GGUF --include "Monstral-123B-v2-Q4_K_M.gguf" --local-dir ./
    
  3. Run Inference: Use LM Studio or appropriate framework for inference. For optimal performance, use cloud GPUs from providers like AWS, GCP, or Azure, especially if your local hardware is limited.

  4. Considerations: Choose quantization according to your hardware capabilities and performance needs. Use K-quants for general use or I-quants for specific GPU acceleration scenarios.

License
The model is licensed under the MRL (Machine Readable License), which dictates the terms of use, distribution, and modification.

More Related APIs in Text Generation