O L Mo 2 1124 13 B Instruct R L V R1
allenaiIntroduction
The OLMo-2-1124-13B-Instruct-RLVR1 is a language model developed by Allen Institute for AI. It is part of the OLMo series, aimed at advancing the science of language models by enabling comprehensive research. This model is trained primarily on English text and is designed for text generation tasks.
Architecture
The model is a 13 billion parameter language model that builds upon the OLMo-2 architecture. It has undergone multiple stages of training, including supervised fine-tuning on a variant of the Tülu 3 dataset, DPO training, and finally, RLVR training. The model is compatible with the Transformers library, utilizing AutoModelForCausalLM
for loading and inference.
Training
OLMo-2-1124-13B-Instruct-RLVR1 has been fine-tuned using a mix of publicly available, synthetic, and human-created datasets. The fine-tuning process includes several key stages:
- Supervised Fine-Tuning (SFT): Trained on the Tülu 3 dataset variant for diverse task performance.
- DPO Training: Further optimization on the preference mix dataset.
- RLVR Training: Enhanced using the RLVR-GSM-MATH-IF-Mixed-Constraints dataset for improved performance in specific domains such as MATH, GSM8K, and IFEval.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install the Transformers Library:
pip install --upgrade git+https://github.com/huggingface/transformers.git
-
Load the Model:
from transformers import AutoModelForCausalLM olmo_model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-13B-Instruct-RLVR1")
-
Set Up GPU:
- For optimal performance, it is recommended to use cloud GPUs such as AWS EC2 P3 instances or Google Cloud's AI Platform.
License
The OLMo-2-1124-13B-Instruct-RLVR1 model is licensed under the Apache 2.0 license. It is intended for research and educational use, with additional terms applicable to outputs generated from third-party models. For more information, refer to the Responsible Use Guidelines provided by the Allen Institute for AI.