G R M Llama3.2 3 B rewardmodel ft
Ray2333Introduction
The GRM-Llama3.2-3B-rewardmodel-ft is a reward model fine-tuned from the GRM-llama3.2-3B-sftreg using the decontaminated Skywork preference dataset v0.2. This model achieves a score of 90.9 on the reward-bench, surpassing some 8B reward models and even outperforming models like GPT-4/Gemini in specific tasks.
Architecture
The model is built upon the LLaMA3 architecture with a focus on text classification. It integrates the Skywork Reward Preference dataset to enhance its performance and generalizability.
Training
The model is fine-tuned from the Ray2333/GRM-llama3.2-3B-sftreg base model using the Skywork/Skywork-Reward-Preference-80K-v0.2 dataset. This process enables the model to achieve state-of-the-art performance, especially among models under 7B parameters.
Guide: Running Locally
To run this model locally, follow these steps:
-
Install Required Libraries: Ensure you have
torch
andtransformers
installed.pip install torch transformers
-
Load Model and Tokenizer:
import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification device = 'cuda:0' tokenizer = AutoTokenizer.from_pretrained('Ray2333/GRM-Llama3.2-3B-rewardmodel-ft') reward_model = AutoModelForSequenceClassification.from_pretrained( 'Ray2333/GRM-Llama3.2-3B-rewardmodel-ft', torch_dtype=torch.float16, device_map=device)
-
Prepare Input: Use the tokenizer to prepare your input message.
message = [ {'role': 'user', 'content': "Your input text here."}, {'role': 'assistant', 'content': "Expected response."} ] message_template = tokenizer.apply_chat_template(message, tokenize=False) kwargs = {"padding": 'max_length', "truncation": True, "return_tensors": "pt"} tokens = tokenizer.encode_plus(message_template, **kwargs)
-
Generate Reward Score:
with torch.no_grad(): reward_tensor = reward_model(tokens["input_ids"][0].view(1,-1).to(device), attention_mask=tokens["attention_mask"][0].view(1,-1).to(device))[0] reward = reward_tensor.cpu().detach().item()
Cloud GPUs: For optimal performance, consider using cloud-based GPUs like AWS, Google Cloud, or Azure.
License
The model is released under the Apache 2.0 license, allowing for both personal and commercial use with proper attribution.