I N F O R M Llama3.1 70 B

infly

Introduction

The [INF-ORM-Llama3.1-70B] is a reward model based on the Llama-3.1-70B-Instruct architecture, trained using the INF-ORM-Preference-Magnitude-80K dataset. Enhancements in data pre-processing, modified score heads, and model merging were implemented to boost model performance.

Architecture

The model architecture is based on the Llama-3.1-70B-Instruct framework. A modified score head is utilized, which includes a sequential neural network layer with ReLU activation for improved performance over the original linear scoring method.

Training

The model was trained using the INF-ORM-Preference-Magnitude-80K dataset, derived from a decontaminated set, Skywork/Skywork-Reward-Preference-80k-v0.2. A 'Magnitude' column was added, defining the quality of answers with values ranging from 1 to 3. The model was then trained using scaled BT loss, a variant of cross-entropy loss. Two versions of the model were merged with equal weights to achieve better results, with significant improvements in reasoning and safety scores.

Guide: Running Locally

  1. Install Required Libraries:

    • Ensure you have PyTorch and the Hugging Face Transformers library installed.
  2. Load Model and Tokenizer:

    • Use INFORMForSequenceClassification.from_pretrained to load the model.
    • Use PreTrainedTokenizerFast.from_pretrained for the tokenizer.
  3. Prepare Input Data:

    • Tokenize input conversations using the tokenizer's chat template function.
  4. Inference:

    • Perform inference with the model to get reward scores for input data.
  5. Environment:

    • A GPU is recommended for running the model locally. Consider using cloud providers like AWS or Google Cloud for access to powerful GPUs.

License

The INF-ORM-Llama3.1-70B model supports commercial applications under a permissive license. For more details, refer to the license agreement.

More Related APIs in Text Classification