O L Mo 2 1124 7 B Instruct preview
allenaiIntroduction
OLMo-2-1124-7B-Instruct is a post-trained variant of the OLMo-2 7B model, developed by AllenAI. It is designed for text generation and has undergone several layers of fine-tuning and reinforcement learning to enhance its performance on a variety of tasks. This model primarily supports the English language and is part of the OLMo series, which aims to advance the science of language models.
Architecture
The model is built on the Transformers library and is fine-tuned from the allenai/OLMo-2-7B-1124-DPO base model. It employs a mixture of publicly available, synthetic, and human-created datasets, including the Tülu 3 dataset, to achieve state-of-the-art performance in conversational and task-specific settings.
Training
OLMo-2-1124-7B-Instruct has undergone multiple training stages:
- Supervised finetuning on the Tülu 3 dataset.
- DPO (Direct Preference Optimization) training on a preference mix of this dataset.
- RLVR (Reinforcement Learning with Variable Rewards) training using the RLVR-GSM dataset.
The training process includes hyperparameter tuning with settings such as a learning rate of 3 × 10⁻⁷, PPO update iterations set to 4, and a batch size of 512.
Guide: Running Locally
-
Installation: Ensure you have the latest version of the Transformers library by running:
pip install --upgrade git+https://github.com/huggingface/transformers.git
-
Loading the Model: Use the following Python code to load the model:
from transformers import AutoModelForCausalLM olmo_model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B-Instruct")
-
Cloud GPU Suggestion: For optimal performance, consider using cloud-based GPUs from providers like AWS EC2, Google Cloud Platform, or Azure to handle the model's computational demands.
License
OLMo-2-1124-7B-Instruct is licensed under the Apache 2.0 license, making it suitable for research and educational purposes. Users must adhere to the Responsible Use Guidelines and additional terms outlined in the Gemma Terms of Use.