gemma 2 9b it G G U F
bartowskiIntroduction
GEMMA-2-9B-IT-GGUF is a conversational text generation model, quantized using the llama.cpp framework. It is based on Google's GEMMA-2-9B-IT and made available through Hugging Face. This model is optimized for various levels of hardware capabilities, providing different quality and performance trade-offs.
Architecture
The model employs quantization techniques using llama.cpp's imatrix option to reduce model size while maintaining performance. It supports conversational text generation tasks and utilizes a prompt format that distinguishes between user and model inputs. The model does not use a system prompt and is available in various quantization formats, each tailored for specific hardware configurations and performance requirements.
Training
The GEMMA-2-9B-IT model was originally developed by Google and has been quantized by the user "bartowski" using specific datasets and techniques. The quantization was performed to create a range of model files that cater to different hardware capabilities, from high-VRAM GPUs to lower RAM environments.
Guide: Running Locally
-
Install Hugging Face CLI:
Ensure you have the Hugging Face CLI installed with the command:pip install -U "huggingface_hub[cli]"
-
Download the Model File:
Use the CLI to download the specific quantized model file suited to your system's RAM/VRAM:huggingface-cli download bartowski/gemma-2-9b-it-GGUF --include "gemma-2-9b-it-Q4_K_M.gguf" --local-dir ./
-
Choose the Appropriate File:
- Assess your GPU's VRAM and system RAM to select a model file that fits within your available memory.
- For best performance, choose a file 1-2GB smaller than your VRAM (for GPU use) or combined RAM (for CPU use).
- Consider using K-quants for general use or I-quants for improved performance with specific hardware like cuBLAS (Nvidia) or rocBLAS (AMD).
-
Cloud GPU Recommendation:
For enhanced performance, consider using cloud GPU services that provide higher VRAM, such as AWS EC2 instances with NVIDIA GPUs.
License
The model is distributed under the Gemma license. Users must review and agree to this license before accessing the model on Hugging Face. Access requires logging into Hugging Face and acknowledging the license terms.