gemma 2b it

google

Introduction

Gemma is a family of lightweight, state-of-the-art open models from Google, specifically designed for text generation tasks like question answering, summarization, and reasoning. These models are text-to-text, decoder-only large language models available in English, featuring open weights and both pre-trained and instruction-tuned variants. Their small size enables deployment on resource-limited environments, such as laptops or cloud infrastructure, democratizing access to advanced AI models.

Architecture

Gemma models are built using the same research and technology as Google's Gemini models. They utilize a decoder-only architecture for text-to-text tasks and are trained to handle a variety of text generation applications. The models support different precisions, including bfloat16, float16, float32, and quantized versions with int8 and 4-bit precision for optimized performance on specific hardware.

Training

Gemma models were trained on a comprehensive dataset comprising 6 trillion tokens from diverse sources like web documents, code, and mathematical texts. This diverse data exposure is crucial for handling varied tasks and text formats. Training employed Google's Tensor Processing Units (TPUs) and utilized JAX and ML Pathways for efficient computation and orchestration.

Guide: Running Locally

To run the Gemma model locally, follow these steps:

  1. Installation: Install the necessary libraries:

    pip install transformers accelerate
    
  2. Running on CPU:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    
    tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
    model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", torch_dtype=torch.bfloat16)
    
    input_text = "Write me a poem about Machine Learning."
    input_ids = tokenizer(input_text, return_tensors="pt")
    
    outputs = model.generate(**input_ids)
    print(tokenizer.decode(outputs[0]))
    
  3. Running on GPU:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    
    tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
    model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.bfloat16)
    
    input_text = "Write me a poem about Machine Learning."
    input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
    
    outputs = model.generate(**input_ids)
    print(tokenizer.decode(outputs[0]))
    
  4. Cloud GPUs: For enhanced performance, consider using cloud-based GPU services like Google Cloud Platform or AWS, which provide scalable and cost-effective options for running large models.

License

The Gemma model is licensed under a specific usage license by Google. Users must review and agree to these terms to access the model on Hugging Face.

More Related APIs in Text Generation