gemma 7b it

google

Introduction

Gemma is a series of lightweight, state-of-the-art open models developed by Google, based on the same research and technology as the Gemini models. These are text-to-text, decoder-only large language models, optimized for English and suitable for various text generation tasks. Gemma models are available with open weights and are designed for deployment on limited-resource environments, such as personal devices or cloud infrastructures.

Architecture

Gemma models are built using Google's latest AI architecture, leveraging JAX and ML Pathways for efficient training on Tensor Processing Units (TPUs). These models are designed to handle a wide range of linguistic tasks and formats, making them versatile for diverse applications.

Training

The Gemma models were trained on a vast dataset comprising 6 trillion tokens, sourced from diverse web documents, code, and mathematical texts. The training process involved significant data preprocessing to filter out sensitive and harmful content. These models utilize Google's TPU hardware for efficient training, which offers advantages in performance, memory, scalability, and cost-effectiveness.

Guide: Running Locally

To run the Gemma model locally, follow these steps:

  1. Install Required Libraries:

    pip install transformers accelerate bitsandbytes
    
  2. Load the Model:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    
    tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
    model = AutoModelForCausalLM.from_pretrained(
        "google/gemma-7b-it",
        device_map="auto",
        torch_dtype=torch.bfloat16
    )
    
  3. Run Inference:

    input_text = "Write me a poem about Machine Learning."
    input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
    outputs = model.generate(**input_ids)
    print(tokenizer.decode(outputs[0]))
    
  4. Precision Options: Utilize different precisions such as torch.bfloat16, torch.float16, or torch.float32 depending on your hardware capabilities.

Cloud GPU Recommendation: For enhanced performance, consider using cloud services offering NVIDIA GPUs, such as AWS, Google Cloud, or Azure.

License

The Gemma models are available under the Gemma license, which requires reviewing and agreeing to Google's usage terms. Access to the models is contingent upon acknowledgment of these terms on the Hugging Face platform.

More Related APIs in Text Generation