G R M Llama3.2 3 B rewardmodel ft

Ray2333

Introduction

The GRM-Llama3.2-3B-rewardmodel-ft is a reward model fine-tuned from the GRM-llama3.2-3B-sftreg using the decontaminated Skywork preference dataset v0.2. This model achieves a score of 90.9 on the reward-bench, surpassing some 8B reward models and even outperforming models like GPT-4/Gemini in specific tasks.

Architecture

The model is built upon the LLaMA3 architecture with a focus on text classification. It integrates the Skywork Reward Preference dataset to enhance its performance and generalizability.

Training

The model is fine-tuned from the Ray2333/GRM-llama3.2-3B-sftreg base model using the Skywork/Skywork-Reward-Preference-80K-v0.2 dataset. This process enables the model to achieve state-of-the-art performance, especially among models under 7B parameters.

Guide: Running Locally

To run this model locally, follow these steps:

  1. Install Required Libraries: Ensure you have torch and transformers installed.

    pip install torch transformers
    
  2. Load Model and Tokenizer:

    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    
    device = 'cuda:0'
    tokenizer = AutoTokenizer.from_pretrained('Ray2333/GRM-Llama3.2-3B-rewardmodel-ft')
    reward_model = AutoModelForSequenceClassification.from_pretrained(
        'Ray2333/GRM-Llama3.2-3B-rewardmodel-ft', torch_dtype=torch.float16, device_map=device)
    
  3. Prepare Input: Use the tokenizer to prepare your input message.

    message = [
      {'role': 'user', 'content': "Your input text here."},
      {'role': 'assistant', 'content': "Expected response."}
    ]
    message_template = tokenizer.apply_chat_template(message, tokenize=False)
    kwargs = {"padding": 'max_length', "truncation": True, "return_tensors": "pt"}
    tokens = tokenizer.encode_plus(message_template, **kwargs)
    
  4. Generate Reward Score:

    with torch.no_grad():
        reward_tensor = reward_model(tokens["input_ids"][0].view(1,-1).to(device), attention_mask=tokens["attention_mask"][0].view(1,-1).to(device))[0]
        reward = reward_tensor.cpu().detach().item()
    

Cloud GPUs: For optimal performance, consider using cloud-based GPUs like AWS, Google Cloud, or Azure.

License

The model is released under the Apache 2.0 license, allowing for both personal and commercial use with proper attribution.

More Related APIs in Text Classification