granite 3.1 8b instruct G G U F
bartowskiIntroduction
The GRANITE-3.1-8B-INSTRUCT-GGUF model by Bartowski is designed for text generation tasks. It is based on the IBM Granite-3.1-8B-Instruct model and has been quantized using the llama.cpp framework. The model is licensed under Apache 2.0.
Architecture
GRANITE-3.1-8B-INSTRUCT-GGUF utilizes the llama.cpp framework for quantization, employing the imatrix option for superior performance. The model is tailored for conversational language tasks, making use of different quantization levels to optimize performance based on system resources.
Training
The quantized model is derived from the IBM Granite-3.1-8B-Instruct model, with the quantizations created using a dataset specifically designed for imatrix calibration. This process allows for efficient model performance across various hardware configurations.
Guide: Running Locally
- Installation: Ensure you have the
huggingface-cli
tool by running:pip install -U "huggingface_hub[cli]"
- Download: Use the CLI to download the desired quantized model file. For example:
huggingface-cli download bartowski/granite-3.1-8b-instruct-GGUF --include "granite-3.1-8b-instruct-Q4_K_M.gguf" --local-dir ./
- Choose Quantization: Select a quantization level based on your system's RAM and VRAM. For maximum quality and performance, choose a quantization size slightly smaller than your system's total memory.
- Execution: Run the model using your preferred environment, ensuring compatibility with your hardware (e.g., GPU, CPU).
Cloud GPUs
Consider using cloud GPU services such as AWS, Google Cloud, or Azure for enhanced computational power, especially if local resources are limited.
License
The GRANITE-3.1-8B-INSTRUCT-GGUF model is distributed under the Apache 2.0 license, allowing for wide usage and modification with proper attribution.