gemma 7b it
googleIntroduction
Gemma is a series of lightweight, state-of-the-art open models developed by Google, based on the same research and technology as the Gemini models. These are text-to-text, decoder-only large language models, optimized for English and suitable for various text generation tasks. Gemma models are available with open weights and are designed for deployment on limited-resource environments, such as personal devices or cloud infrastructures.
Architecture
Gemma models are built using Google's latest AI architecture, leveraging JAX and ML Pathways for efficient training on Tensor Processing Units (TPUs). These models are designed to handle a wide range of linguistic tasks and formats, making them versatile for diverse applications.
Training
The Gemma models were trained on a vast dataset comprising 6 trillion tokens, sourced from diverse web documents, code, and mathematical texts. The training process involved significant data preprocessing to filter out sensitive and harmful content. These models utilize Google's TPU hardware for efficient training, which offers advantages in performance, memory, scalability, and cost-effectiveness.
Guide: Running Locally
To run the Gemma model locally, follow these steps:
-
Install Required Libraries:
pip install transformers accelerate bitsandbytes
-
Load the Model:
from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it") model = AutoModelForCausalLM.from_pretrained( "google/gemma-7b-it", device_map="auto", torch_dtype=torch.bfloat16 )
-
Run Inference:
input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") outputs = model.generate(**input_ids) print(tokenizer.decode(outputs[0]))
-
Precision Options: Utilize different precisions such as
torch.bfloat16
,torch.float16
, ortorch.float32
depending on your hardware capabilities.
Cloud GPU Recommendation: For enhanced performance, consider using cloud services offering NVIDIA GPUs, such as AWS, Google Cloud, or Azure.
License
The Gemma models are available under the Gemma license, which requires reviewing and agreeing to Google's usage terms. Access to the models is contingent upon acknowledgment of these terms on the Hugging Face platform.