gemma 2 2b it G G U F

MaziyarPanahi

Introduction

GEMMA-2-2B-IT-GGUF is a quantized model version of Google's GEMMA-2-2B-IT, adapted to the GGUF format by Maziyar Panahi. It is designed for text generation tasks and offers various precision levels, from 2-bit to 8-bit.

Architecture

The model is based on the GEMMA-2-2B-IT architecture from Google, utilizing the GGUF format for improved compatibility and performance across different platforms. GGUF replaces the older GGML format and is supported by several clients and libraries.

Training

The model was quantized by Maziyar Panahi, targeting efficient performance for text generation. The quantization reduces the model's memory footprint, enabling it to run on more constrained hardware while maintaining reasonable accuracy.

Guide: Running Locally

  1. Clone the repository or download the model files from Hugging Face.
  2. Install a GGUF-supported client or library such as:
    • llama.cpp for command-line or server options.
    • llama-cpp-python for Python integration with GPU acceleration.
    • LM Studio for a GUI on Windows or macOS.
    • text-generation-webui for a web-based interface with extensions.
  3. Choose a cloud GPU provider like AWS, Google Cloud, or Azure for enhanced performance, especially when handling larger models or datasets.
  4. Configure the client or library according to the documentation, specifying the desired bit precision.

License

The model and its associated files are subject to the licensing terms set forth by the creator, Google, and any modifications or adaptations by Maziyar Panahi. Ensure compliance with these terms when using or distributing the model.

More Related APIs in Text Generation