gemma 2 9b
googleIntroduction
Gemma is a family of lightweight, state-of-the-art large language models from Google, designed for text-to-text generation tasks. These models are suitable for tasks like question answering, summarization, and reasoning. They are built to be resource-efficient, enabling deployment in limited-resource environments while maintaining high performance.
Architecture
Gemma models are text-to-text, decoder-only models with open weights. They are designed to be lightweight and efficient, utilizing TPU hardware for training. The architecture allows for scaling across multiple tasks using Google's ML Pathways and JAX frameworks for enhanced training efficiency.
Training
Gemma models are trained on a diverse dataset, including web documents, code, and mathematical text. The 9B model was trained with 8 trillion tokens, using rigorous data filtering processes to ensure quality and safety. Training was conducted on TPUs, providing high performance and scalability.
Guide: Running Locally
-
Install Transformers Library: Use the following command to install the necessary library:
pip install -U transformers
-
Running the Model with Pipeline API:
import torch from transformers import pipeline pipe = pipeline( "text-generation", model="google/gemma-2-9b", device="cuda" # Use "mps" for Mac devices ) text = "Once upon a time," outputs = pipe(text, max_new_tokens=256) response = outputs[0]["generated_text"] print(response)
-
Running on Single/Multi GPU:
- Install
accelerate
:pip install accelerate
- Example code for running on GPU:
from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b") model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b", device_map="auto") input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") outputs = model.generate(**input_ids, max_new_tokens=32) print(tokenizer.decode(outputs[0]))
- Install
-
Using CLI:
- Follow the installation instructions from the
local-gemma
repository and use the command:local-gemma --model "google/gemma-2-9b" --prompt "What is the capital of Mexico?"
- Follow the installation instructions from the
-
Utilizing Cloud GPUs: For efficient processing, consider using cloud GPU services such as Google Cloud or AWS.
License
Gemma models are available under Google's usage license. Users must review and agree to the terms before accessing the models on Hugging Face.