Llama3.1 8 B P R M Deepseek Data
RLHFlowIntroduction
The LLAMA3.1-8B-PRM-DEEPSEEK-DATA model is a process-supervised reward model (PRM) developed using data generated by Mistral from the RLHFlow/RLHF-Reward-Modeling project. It is based on meta-llama/Llama-3.1-8B-Instruct and trained on the Deepseek-PRM-Data dataset.
Architecture
This model utilizes the Transformers library and is designed for text generation tasks. It supports safetensors for secure model handling and is compatible with various inference endpoints. The model employs a sophisticated reward modeling approach to improve performance on specific tasks.
Training
The model was trained for one epoch using a global batch size of 32 and a learning rate of 2e-6. The data was organized into chunks containing 8192 tokens. Detailed training configurations are available in the training YAML file. Evaluation results show improved performance using different methods, such as Majority Voting and PRM, on datasets such as GSM8K and MATH.
Guide: Running Locally
- Clone Repository: Download the model and related files from the Hugging Face repository.
- Install Dependencies: Ensure that the Transformers library and other necessary packages are installed.
- Load Model: Use the provided scripts or examples from the repository to load and interact with the model.
- Run Inference: Execute the model on your desired input data to generate outputs.
For optimal performance, especially for large models like this one, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.
License
The model and its training data are subject to the terms outlined in the GitHub repository of the RLHFlow/RLHF-Reward-Modeling project. Users are encouraged to review these terms before utilizing the model in their applications.