gemma 2 9b it Sim P O
princeton-nlpIntroduction
GEMMA-2-9B-IT-SimPO is a fine-tuned version of the Google gemma-2-9b-it model, designed to enhance large language models through offline preference optimization. This approach uses the Simple Preference Optimization (SimPO) algorithm, which aligns the reward function with generation likelihood, thus improving performance without needing a reference model.
Architecture
The model is based on the Google gemma-2-9b-it architecture. It implements the SimPO training objective, which optimizes language models with preference datasets. This objective is detailed in the related research paper, which proposes a reward function aligned with generation likelihood.
Training
The model was fine-tuned using the princeton-nlp/gemma2-ultrafeedback-armorm dataset. Training took approximately 100 minutes on 8xH100 GPUs. The hyperparameters used are specified in the training script found in the SimPO GitHub repository. The model demonstrated improved performance in various evaluation metrics compared to its predecessors.
Guide: Running Locally
To run the GEMMA-2-9B-IT-SimPO model locally, follow these steps:
-
Install Transformers Library: Ensure you have the
transformers
library installed.pip install transformers
-
Load the Model: Use the following Python code to load and execute the model.
import torch from transformers import pipeline model_id = "princeton-nlp/gemma-2-9b-it-SimPO" generator = pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device="cuda", ) outputs = generator([{"role": "user", "content": "What's the difference between llamas and alpacas?"}], do_sample=False, eos_token_id=[generator.tokenizer.convert_tokens_to_ids("<end_of_turn>"), generator.tokenizer.eos_token_id], max_new_tokens=200) print(outputs[0]['generated_text'])
-
Cloud GPUs: It is recommended to use cloud services like AWS, GCP, or Azure with GPU support for efficient model execution, especially if you lack local GPU resources.
License
The GEMMA-2-9B-IT-SimPO model is licensed under the MIT License, allowing for broad usage and modification rights.