Introduction

Gemma is a family of lightweight, state-of-the-art open models from Google, built on the same research and technology as the Gemini models. They are text-to-text, decoder-only large language models available in English, with open weights and both pre-trained and instruction-tuned variants. Gemma models are suitable for various text generation tasks and can be deployed in environments with limited resources.

Architecture

Gemma models are trained on a context length of 8192 tokens, making them adept at handling extensive text inputs. The models utilize a sophisticated architecture that allows them to generate text based on input prompts. They are trained on diverse datasets, including web documents, code, and mathematics, to improve their versatility.

Training

Gemma models were trained on a dataset comprising 6 trillion tokens from diverse sources. The training process included rigorous data cleaning and filtering to ensure quality and safety. The models were trained using Tensor Processing Unit (TPU) hardware, leveraging JAX and ML Pathways for efficient model training and orchestration.

Guide: Running Locally

Basic Steps

  1. Installation: Ensure you have Python and pip installed. Run pip install -U transformers accelerate bitsandbytes.
  2. Model Loading: Use AutoTokenizer and AutoModelForCausalLM to load the Gemma model.
  3. Inference: Prepare input text, tokenize it, and generate outputs using the model.
  4. Example Code:
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b")
    model = AutoModelForCausalLM.from_pretrained("google/gemma-7b")
    
    input_text = "Write me a poem about Machine Learning."
    input_ids = tokenizer(input_text, return_tensors="pt")
    
    outputs = model.generate(**input_ids)
    print(tokenizer.decode(outputs[0]))
    
  5. Utilizing GPUs: For GPU support, set device_map="auto" and move input IDs to CUDA. Consider cloud GPUs such as AWS EC2 with NVIDIA GPUs, Google Cloud's GPU offerings, or Azure's GPU instances for enhanced performance.

License

Gemma models are available under a specific usage license. Users must agree to Google's usage terms to access the models via Hugging Face. Please review and adhere to these terms for compliant usage.

More Related APIs in Text Generation