gemma 2b
googleIntroduction
Gemma is a family of lightweight, state-of-the-art open models from Google. These models are designed for text-to-text, decoder-only processes and are well-suited for tasks like question answering, summarization, and reasoning. The models offer open weights, pre-trained, and instruction-tuned variants, providing a flexible solution for various applications in environments with limited resources.
Architecture
Gemma models use a context length of 8192 tokens and are implemented using JAX and ML Pathways, leveraging Tensor Processing Units (TPUs) for efficient training. These models are part of the Gemini family, intended to generalize across multiple tasks. The architecture allows for deploying models in various settings, democratizing access to AI technology.
Training
The training dataset consists of 6 trillion tokens from diverse text sources, including web documents, code, and mathematics, predominantly in English. The data underwent rigorous preprocessing, including CSAM and sensitive data filtering, to ensure safety and quality. Training was performed using the latest TPU hardware, benefiting from its performance, memory, scalability, and cost-effectiveness.
Guide: Running Locally
-
Install Dependencies:
- Ensure you have the
transformers
library by runningpip install -U transformers
. - For GPU usage, also install
accelerate
withpip install accelerate
. - For quantized versions, install
bitsandbytes
.
- Ensure you have the
-
Load Model and Tokenizer:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b") model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")
-
Generate Text:
input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**input_ids) print(tokenizer.decode(outputs[0]))
-
Run on GPU:
For GPU usage, move input tensors to CUDA:input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
-
Cloud GPUs:
Consider using cloud services like Google Cloud's Vertex AI for access to TPUs and GPUs for better performance.
License
Access to the Gemma models requires reviewing and agreeing to Google’s usage license, available through Hugging Face. Ensure you are logged in to review and acknowledge the terms.