granite 3.1 8b instruct G G U F

bartowski

Introduction

The GRANITE-3.1-8B-INSTRUCT-GGUF model by Bartowski is designed for text generation tasks. It is based on the IBM Granite-3.1-8B-Instruct model and has been quantized using the llama.cpp framework. The model is licensed under Apache 2.0.

Architecture

GRANITE-3.1-8B-INSTRUCT-GGUF utilizes the llama.cpp framework for quantization, employing the imatrix option for superior performance. The model is tailored for conversational language tasks, making use of different quantization levels to optimize performance based on system resources.

Training

The quantized model is derived from the IBM Granite-3.1-8B-Instruct model, with the quantizations created using a dataset specifically designed for imatrix calibration. This process allows for efficient model performance across various hardware configurations.

Guide: Running Locally

  1. Installation: Ensure you have the huggingface-cli tool by running:
    pip install -U "huggingface_hub[cli]"
    
  2. Download: Use the CLI to download the desired quantized model file. For example:
    huggingface-cli download bartowski/granite-3.1-8b-instruct-GGUF --include "granite-3.1-8b-instruct-Q4_K_M.gguf" --local-dir ./
    
  3. Choose Quantization: Select a quantization level based on your system's RAM and VRAM. For maximum quality and performance, choose a quantization size slightly smaller than your system's total memory.
  4. Execution: Run the model using your preferred environment, ensuring compatibility with your hardware (e.g., GPU, CPU).

Cloud GPUs

Consider using cloud GPU services such as AWS, Google Cloud, or Azure for enhanced computational power, especially if local resources are limited.

License

The GRANITE-3.1-8B-INSTRUCT-GGUF model is distributed under the Apache 2.0 license, allowing for wide usage and modification with proper attribution.

More Related APIs in Text Generation