gemma 2 9b

google

Introduction

Gemma is a family of lightweight, state-of-the-art large language models from Google, designed for text-to-text generation tasks. These models are suitable for tasks like question answering, summarization, and reasoning. They are built to be resource-efficient, enabling deployment in limited-resource environments while maintaining high performance.

Architecture

Gemma models are text-to-text, decoder-only models with open weights. They are designed to be lightweight and efficient, utilizing TPU hardware for training. The architecture allows for scaling across multiple tasks using Google's ML Pathways and JAX frameworks for enhanced training efficiency.

Training

Gemma models are trained on a diverse dataset, including web documents, code, and mathematical text. The 9B model was trained with 8 trillion tokens, using rigorous data filtering processes to ensure quality and safety. Training was conducted on TPUs, providing high performance and scalability.

Guide: Running Locally

  1. Install Transformers Library: Use the following command to install the necessary library:

    pip install -U transformers
    
  2. Running the Model with Pipeline API:

    import torch
    from transformers import pipeline
    
    pipe = pipeline(
        "text-generation",
        model="google/gemma-2-9b",
        device="cuda"  # Use "mps" for Mac devices
    )
    
    text = "Once upon a time,"
    outputs = pipe(text, max_new_tokens=256)
    response = outputs[0]["generated_text"]
    print(response)
    
  3. Running on Single/Multi GPU:

    • Install accelerate:
      pip install accelerate
      
    • Example code for running on GPU:
      from transformers import AutoTokenizer, AutoModelForCausalLM
      import torch
      
      tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b")
      model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b", device_map="auto")
      
      input_text = "Write me a poem about Machine Learning."
      input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
      outputs = model.generate(**input_ids, max_new_tokens=32)
      print(tokenizer.decode(outputs[0]))
      
  4. Using CLI:

    • Follow the installation instructions from the local-gemma repository and use the command:
      local-gemma --model "google/gemma-2-9b" --prompt "What is the capital of Mexico?"
      
  5. Utilizing Cloud GPUs: For efficient processing, consider using cloud GPU services such as Google Cloud or AWS.

License

Gemma models are available under Google's usage license. Users must review and agree to the terms before accessing the models on Hugging Face.

More Related APIs in Text Generation