Skywork Reward Llama 3.1 8 B v0.2
SkyworkIntroduction
Skywork-Reward-Llama-3.1-8B-v0.2 is an advanced reward model built on the Llama-3.1-8B-Instruct architecture. It is trained using the Skywork Reward Data Collection, which consists of 80K high-quality preference pairs sourced from publicly available data. The model is designed to handle complex preference scenarios across domains such as mathematics, coding, and safety.
Architecture
The model is based on the Llama-3.1-8B-Instruct architecture. It is a sequence classifier that excels at handling preferences in various domains. The model uses a decontaminated dataset, ensuring it provides reliable results by removing contaminated data pairs.
Training
The training dataset, Skywork Reward Data Collection, is carefully curated to include 80K preference pairs. It includes data from sources such as HelpSteer2, OffsetBias, and WildGuard. The dataset aims to balance performance across different domains like math and coding. The model ranks high on the RewardBench leaderboard, demonstrating its effectiveness across multiple evaluation metrics.
Guide: Running Locally
- Environment Setup: Ensure you have Python and PyTorch installed. Set up a virtual environment to manage dependencies.
- Install Transformers: Use
pip install transformers
to install the Hugging Face Transformers library. - Load the Model:
from transformers import AutoModelForSequenceClassification, AutoTokenizer model_name = "Skywork/Skywork-Reward-Llama-3.1-8B-v0.2" model = AutoModelForSequenceClassification.from_pretrained(model_name, torch_dtype=torch.bfloat16) tokenizer = AutoTokenizer.from_pretrained(model_name)
- Inference: Use a GPU for optimal performance. Cloud GPUs such as those offered by AWS, Google Cloud, or Azure are recommended for handling large models efficiently.
- Run Example: Follow the demo code provided in the documentation to score conversational responses.
License
The Skywork model is released under the Skywork Community License, which supports commercial use. Users must adhere to the terms and conditions specified in the license. The model should not be used for unlawful activities or without necessary security reviews. For commercial use, compliance with the Skywork Community License is mandatory.