Mistral Nemo Instruct 2407 G G U F
bartowskiIntroduction
The Mistral-Nemo-Instruct-2407-GGUF model is a text generation model available on Hugging Face. It supports nine languages: English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, and Japanese. The model is distributed under the Apache 2.0 license and is quantized by bartowski for optimized performance.
Architecture
This model is a quantized version of the Mistral-Nemo-Instruct-2407 model, utilizing the llama.cpp framework for quantization. The quantization process uses the imatrix option to enhance performance and efficiency, focusing on embed and output weights to potentially improve quality.
Training
The quantization of the model was performed using llama.cpp, specifically with the imatrix method. Various quantization levels, such as Q8_0, Q6_K, and Q4_0, have been applied to balance between quality and performance, depending on user requirements and system capabilities.
Guide: Running Locally
To run the model locally:
-
Installation: Ensure
huggingface-cli
is installed by running:pip install -U "huggingface_hub[cli]"
-
Download Model: Use the
huggingface-cli
to download the desired quantized file. For example:huggingface-cli download bartowski/Mistral-Nemo-Instruct-2407-GGUF --include "Mistral-Nemo-Instruct-2407-Q4_K_M.gguf" --local-dir ./
-
Choose File: Select a file based on your RAM and VRAM capabilities. For maximum quality, choose a file size 1-2GB smaller than your total RAM and VRAM.
-
Cloud GPUs: Consider running the model on a cloud GPU for enhanced performance. Platforms like AWS, GCP, or Azure offer suitable GPU instances.
License
The model is released under the Apache 2.0 license, allowing for free use, modification, and distribution.