Eurus R M 7b
openbmbIntroduction
Eurus-RM-7B is a reward model trained on datasets like UltraInteract, UltraFeedback, and UltraSafety, using a reward modeling objective designed to enhance reasoning capabilities. It is recognized for its superiority among 7 billion parameter reward models, achieving comparable or superior performance to larger models, including outperforming GPT-4 in specific tasks. The training objective is particularly effective in enhancing performance on complex reasoning tasks. Mixing datasets allows for balanced reward modeling capabilities, and Eurus-RM-7B significantly boosts reasoning performance through reranking.
Architecture
Eurus-RM-7B is designed with a focus on reward modeling to improve reasoning abilities. It leverages a combination of datasets and a specific training objective, allowing it to effectively tackle complex reasoning tasks and improve overall reward model performance.
Training
Eurus-RM-7B is trained using a blend of UltraInteract, UltraFeedback, and UltraSafety datasets. The training process focuses on a reward modeling objective that enhances reasoning capabilities, especially in challenging scenarios. This approach helps the model achieve significant performance improvements over other models, including larger ones like GPT-4, in certain tasks.
Guide: Running Locally
-
Setup Environment: Ensure Python and PyTorch are installed. Install the
transformers
library from Hugging Face.pip install transformers torch
-
Download Model: Use the following code to load the model and tokenizer.
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("openbmb/Eurus-RM-7b") model = AutoModel.from_pretrained("openbmb/Eurus-RM-7b", trust_remote_code=True)
-
Inference: Use the model to evaluate text inputs, focusing on reasoning capabilities.
import torch def test(model_path): dataset = [ { "chosen": "[INST] Sural relates to which part of the body? [/INST] The sural region...", "rejected": "[INST] Sural relates to which part of the body? [/INST] The Sural nerve..." } ] tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModel.from_pretrained(model_path, trust_remote_code=True) with torch.no_grad(): for example in dataset: inputs = tokenizer(example["chosen"], return_tensors="pt") chosen_reward = model(**inputs).item() inputs = tokenizer(example["rejected"], return_tensors="pt") rejected_reward = model(**inputs).item() print(chosen_reward - rejected_reward) test("openbmb/Eurus-RM-7b")
-
Cloud GPUs: For enhanced performance, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
Eurus-RM-7B is released under the Apache-2.0 License. This license permits use, distribution, and modification, provided that proper attribution is maintained.