Skywork Reward Gemma 2 27 B v0.2 LLM Model

Introduction

Skywork-Reward-Gemma-2-27B-v0.2 is an advanced reward model based on the gemma-2-27b-it architecture. It is designed to handle complex preferences in various domains such as mathematics, coding, and safety. The model is trained on the Skywork Reward Data Collection, which comprises 80K high-quality preference pairs from publicly available data.

Architecture

The model is built on the gemma-2-27b-it architecture and utilizes the Skywork Reward Data Collection, which includes curated samples from multiple public data sources. The focus is on maintaining a balance across different domains while improving performance through careful data selection and scoring techniques.

Training

Skywork-Reward-Gemma-2-27B-v0.2 was trained using a decontaminated dataset version, Skywork-Reward-Preference-80K-v0.2, ensuring no contamination with evaluation prompts from RewardBench. The dataset consists of 80K samples curated from sources like HelpSteer2, OffsetBias, and WildGuard. The training process involved selecting top samples and scoring responses to enhance performance across various domains.

Guide: Running Locally

To run Skywork-Reward-Gemma-2-27B-v0.2 locally, follow these steps:

Environment Setup: Ensure you have Python and PyTorch installed. Install the Transformers library:
```
pip install transformers
```

Load the Model: Use the following code to load and run the model:

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

device = "cuda:0"
model_name = "Skywork/Skywork-Reward-Gemma-2-27B-v0.2"
rm = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map=device,
    attn_implementation="flash_attention_2",
    num_labels=1,
)
rm_tokenizer = AutoTokenizer.from_pretrained(model_name)

Perform Inference: Format and tokenize your input data, then obtain the reward scores:

prompt = "Your prompt here"
response = "Your response here"
conv = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response}]
conv_tokenized = rm_tokenizer.apply_chat_template(conv, tokenize=True, return_tensors="pt").to(device)

with torch.no_grad():
    score = rm(conv_tokenized).logits[0][0].item()
print(f"Score for response: {score}")

Cloud GPUs: For optimal performance, especially with the 27B model, consider using cloud GPUs such as AWS EC2, Google Cloud, or Azure.

License

The Skywork model is governed by the Skywork Community License, which allows for commercial use under specific terms. Users must comply with the license terms available at the Skywork GitHub repository. The model should not be used for unlawful activities or without proper security assessments.

More Related APIs in Text Classification