Skywork Reward Gemma 2 27 B v0.2
SkyworkIntroduction
Skywork-Reward-Gemma-2-27B-v0.2 is an advanced reward model based on the gemma-2-27b-it architecture. It is designed to handle complex preferences in various domains such as mathematics, coding, and safety. The model is trained on the Skywork Reward Data Collection, which comprises 80K high-quality preference pairs from publicly available data.
Architecture
The model is built on the gemma-2-27b-it architecture and utilizes the Skywork Reward Data Collection, which includes curated samples from multiple public data sources. The focus is on maintaining a balance across different domains while improving performance through careful data selection and scoring techniques.
Training
Skywork-Reward-Gemma-2-27B-v0.2 was trained using a decontaminated dataset version, Skywork-Reward-Preference-80K-v0.2, ensuring no contamination with evaluation prompts from RewardBench. The dataset consists of 80K samples curated from sources like HelpSteer2, OffsetBias, and WildGuard. The training process involved selecting top samples and scoring responses to enhance performance across various domains.
Guide: Running Locally
To run Skywork-Reward-Gemma-2-27B-v0.2 locally, follow these steps:
-
Environment Setup: Ensure you have Python and PyTorch installed. Install the Transformers library:
pip install transformers
-
Load the Model: Use the following code to load and run the model:
import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer device = "cuda:0" model_name = "Skywork/Skywork-Reward-Gemma-2-27B-v0.2" rm = AutoModelForSequenceClassification.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map=device, attn_implementation="flash_attention_2", num_labels=1, ) rm_tokenizer = AutoTokenizer.from_pretrained(model_name)
-
Perform Inference: Format and tokenize your input data, then obtain the reward scores:
prompt = "Your prompt here" response = "Your response here" conv = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response}] conv_tokenized = rm_tokenizer.apply_chat_template(conv, tokenize=True, return_tensors="pt").to(device) with torch.no_grad(): score = rm(conv_tokenized).logits[0][0].item() print(f"Score for response: {score}")
-
Cloud GPUs: For optimal performance, especially with the 27B model, consider using cloud GPUs such as AWS EC2, Google Cloud, or Azure.
License
The Skywork model is governed by the Skywork Community License, which allows for commercial use under specific terms. Users must comply with the license terms available at the Skywork GitHub repository. The model should not be used for unlawful activities or without proper security assessments.