Introduction

Gemma is a family of lightweight, state-of-the-art open models from Google. These models are designed for text-to-text, decoder-only processes and are well-suited for tasks like question answering, summarization, and reasoning. The models offer open weights, pre-trained, and instruction-tuned variants, providing a flexible solution for various applications in environments with limited resources.

Architecture

Gemma models use a context length of 8192 tokens and are implemented using JAX and ML Pathways, leveraging Tensor Processing Units (TPUs) for efficient training. These models are part of the Gemini family, intended to generalize across multiple tasks. The architecture allows for deploying models in various settings, democratizing access to AI technology.

Training

The training dataset consists of 6 trillion tokens from diverse text sources, including web documents, code, and mathematics, predominantly in English. The data underwent rigorous preprocessing, including CSAM and sensitive data filtering, to ensure safety and quality. Training was performed using the latest TPU hardware, benefiting from its performance, memory, scalability, and cost-effectiveness.

Guide: Running Locally

  1. Install Dependencies:

    • Ensure you have the transformers library by running pip install -U transformers.
    • For GPU usage, also install accelerate with pip install accelerate.
    • For quantized versions, install bitsandbytes.
  2. Load Model and Tokenizer:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
    model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")
    
  3. Generate Text:

    input_text = "Write me a poem about Machine Learning."
    input_ids = tokenizer(input_text, return_tensors="pt")
    outputs = model.generate(**input_ids)
    print(tokenizer.decode(outputs[0]))
    
  4. Run on GPU:
    For GPU usage, move input tensors to CUDA:

    input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
    
  5. Cloud GPUs:
    Consider using cloud services like Google Cloud's Vertex AI for access to TPUs and GPUs for better performance.

License

Access to the Gemma models requires reviewing and agreeing to Google’s usage license, available through Hugging Face. Ensure you are logged in to review and acknowledge the terms.

More Related APIs in Text Generation