Llama3.1 8 B P R M Deepseek Data G G U F

QuantFactory

Introduction

The LLAMA3.1-8B-PRM-DEEPSEEK-DATA-GGUF is a quantized version of a reward model initially developed by RLHFlow. It is optimized using data from a project focused on process-supervised reward modeling, leveraging Mistral-generated data. This model is built on the Llama-3.1-8B-Instruct architecture and trained with specific datasets for enhanced performance.

Architecture

This model is based on the Llama-3.1-8B architecture, utilizing a process-supervised reward model (PRM) trained on Mistral-generated data. The quantization was achieved using the llama.cpp framework, which allows for efficient deployment and inference.

Training

The model is trained on a dataset named RLHFlow/Deepseek-PRM-Data for one epoch. Training involves a global batch size of 32 and a learning rate of 2e-6, with samples packed and split into chunks of 8192 tokens. Detailed training parameters and configurations can be found in the provided YAML file link.

Evaluation

Evaluation results show improved performance with various inference scaling methods, including Majority Voting and Process-Supervised Reward Modeling (PRM) at different token levels.

Guide: Running Locally

  1. Clone the repository from Hugging Face or the GitHub link provided.
  2. Install dependencies, primarily ensuring that the transformers library is installed.
  3. Load the model using the transformers library and initialize it with the desired configuration.

For optimal performance, it is recommended to use cloud GPUs, such as those offered by AWS, Google Cloud, or Azure, to handle the computational requirements.

License

The model and associated training recipes are available under open-source licenses. Refer to the GitHub repository for specific licensing details and citation requirements if the training recipe is used in research or production.

More Related APIs