I N F O R M Llama3.1 70 B
inflyIntroduction
The [INF-ORM-Llama3.1-70B]
is a reward model based on the Llama-3.1-70B-Instruct architecture, trained using the INF-ORM-Preference-Magnitude-80K
dataset. Enhancements in data pre-processing, modified score heads, and model merging were implemented to boost model performance.
Architecture
The model architecture is based on the Llama-3.1-70B-Instruct
framework. A modified score head is utilized, which includes a sequential neural network layer with ReLU activation for improved performance over the original linear scoring method.
Training
The model was trained using the INF-ORM-Preference-Magnitude-80K
dataset, derived from a decontaminated set, Skywork/Skywork-Reward-Preference-80k-v0.2
. A 'Magnitude' column was added, defining the quality of answers with values ranging from 1 to 3. The model was then trained using scaled BT loss, a variant of cross-entropy loss. Two versions of the model were merged with equal weights to achieve better results, with significant improvements in reasoning and safety scores.
Guide: Running Locally
-
Install Required Libraries:
- Ensure you have PyTorch and the Hugging Face Transformers library installed.
-
Load Model and Tokenizer:
- Use
INFORMForSequenceClassification.from_pretrained
to load the model. - Use
PreTrainedTokenizerFast.from_pretrained
for the tokenizer.
- Use
-
Prepare Input Data:
- Tokenize input conversations using the tokenizer's chat template function.
-
Inference:
- Perform inference with the model to get reward scores for input data.
-
Environment:
- A GPU is recommended for running the model locally. Consider using cloud providers like AWS or Google Cloud for access to powerful GPUs.
License
The INF-ORM-Llama3.1-70B
model supports commercial applications under a permissive license. For more details, refer to the license agreement.