gemma 2 27b it

google

Introduction

Gemma is a family of lightweight, state-of-the-art open models developed by Google, designed for various text generation tasks such as question answering and summarization. These models are available in English and come in pre-trained and instruction-tuned variants. They are optimized for deployment in environments with limited resources, thus democratizing access to advanced AI models.

Architecture

Gemma models are text-to-text, decoder-only large language models. The architecture is built using Google's latest research technologies, originating from the Gemini models. They are designed to handle a wide range of linguistic styles, topics, and vocabulary by utilizing diverse data sources.

Training

Gemma models are trained on a vast dataset, including web documents, code, and mathematical texts. The 27B model was trained with 13 trillion tokens, while the 9B model used 8 trillion tokens. Training was performed using TPUs and JAX, allowing efficient scaling and resource management. Various data filtering techniques were applied to ensure content quality and safety.

Guide: Running Locally

  1. Install Dependencies:

    pip install -U transformers accelerate bitsandbytes
    
  2. Basic Usage:

    • Use the Transformers library to load the model and tokenizer.
    • Execute the model using either a single or multiple GPU setups, and explore different precision settings like bfloat16, float32, int8, and int4 for optimized performance.
    • Run the model via a command line interface (CLI) using the local-gemma repository.
  3. Cloud GPUs: For enhanced performance, consider using cloud GPU services such as Google Cloud's TPUs or AWS EC2 instances with GPU support.

License

The Gemma models are distributed under the Gemma license. Accessing the models on Hugging Face requires agreement to Google's usage license. Please review the license terms before use.

More Related APIs in Text Generation