O L Mo 2 1124 7 B
allenaiIntroduction
OLMo 2 is a series of autoregressive language models developed by the Allen Institute for AI. The models come in 7B and 13B variants and show improved performance over the original OLMo 7B model. These improvements include a 9-point increase in MMLU scores, achieved through training on OLMo-mix-1124 and Dolmino-mix-1124 datasets using a staged training approach.
Architecture
OLMo 2 models are Transformer-style autoregressive language models, available in two sizes:
- OLMo 2-7B: 32 layers, 4096 hidden size, 32 attention heads, context length of 4096.
- OLMo 2-13B: 40 layers, 5120 hidden size, 40 attention heads, context length of 4096.
Training
The training involves two main stages:
- Pretraining on OLMo-Mix-1124 and Dolmino-Mix-1124 datasets.
- Post-training, using a mixture of techniques including SFT, DPO, and PPO with preference mixes.
Stage 1: Initial Pretraining
- 7B Model: Trained for ~1 epoch on 4 trillion tokens.
- 13B Model: Trained for 1.2 epochs on 5 trillion tokens.
Stage 2: Fine-Tuning
- Involves several training mixes with up to 300 billion tokens, focusing on high-quality data and academic content.
Guide: Running Locally
To run OLMo 2 models locally, follow these steps:
-
Install the Transformers Library:
pip install --upgrade git+https://github.com/huggingface/transformers.git
-
Load the Model:
from transformers import AutoModelForCausalLM, AutoTokenizer olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B") tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-2-1124-7B")
-
Inference Example:
message = ["Language modeling is "] inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False) response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95) print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
-
Quantization for Faster Performance:
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B", torch_dtype=torch.float16, load_in_8bit=True)
Requires
bitsandbytes
package. -
Cloud GPU Recommendation:
- Use cloud services like AWS, GCP, or Azure for access to GPUs if local resources are insufficient.
License
The OLMo 2 models and code are released under the Apache 2.0 License, allowing for broad usage and modification with attribution.