gemma 2 2b

google

Introduction

Gemma is a series of lightweight, open-access text-to-text, decoder-only large language models developed by Google. These models are designed for a variety of text generation tasks, including question answering, summarization, and reasoning. They are developed to be deployable in resource-constrained environments, such as personal computers and cloud infrastructures.

Architecture

Gemma models are built on the same technology as Google's Gemini models, featuring open weights for both pre-trained and instruction-tuned variants. The architecture is optimized to support efficient deployment and usage across different platforms.

Training

The Gemma models were trained on extensive datasets, including web documents, code, and mathematical texts, to cover a broad range of topics and linguistic styles. The dataset for the 2B model consisted of 2 trillion tokens, ensuring a comprehensive training process. Training was executed using Tensor Processing Units (TPUs) with JAX and ML Pathways, allowing for efficient and scalable model development.

Guide: Running Locally

  1. Install Dependencies: Begin by installing the necessary libraries.

    pip install -U transformers
    pip install accelerate
    
  2. Running with the Pipeline API:

    from transformers import pipeline
    pipe = pipeline("text-generation", model="google/gemma-2-2b", device="cuda")
    text = "Once upon a time,"
    outputs = pipe(text, max_new_tokens=256)
    print(outputs[0]["generated_text"])
    
  3. Using a Single/Multi GPU Setup:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b")
    model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b", device_map="auto")
    input_ids = tokenizer("Write me a poem about Machine Learning.", return_tensors="pt").to("cuda")
    outputs = model.generate(**input_ids, max_new_tokens=32)
    print(tokenizer.decode(outputs[0]))
    
  4. Command Line Interface: Use the local-gemma repository for CLI operations.

    local-gemma --model "google/gemma-2-2b" --prompt "What is the capital of Mexico?"
    
  5. Cloud GPUs: For enhanced performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure to leverage faster processing and scalability.

License

The Gemma models are distributed under the Gemma license. Users must agree to Google's usage license to access the models on Hugging Face. For further details, refer to the terms of use.

More Related APIs in Text Generation