O L Mo 2 1124 7 B LLM Model

Introduction

OLMo 2 is a series of autoregressive language models developed by the Allen Institute for AI. The models come in 7B and 13B variants and show improved performance over the original OLMo 7B model. These improvements include a 9-point increase in MMLU scores, achieved through training on OLMo-mix-1124 and Dolmino-mix-1124 datasets using a staged training approach.

Architecture

OLMo 2 models are Transformer-style autoregressive language models, available in two sizes:

OLMo 2-7B: 32 layers, 4096 hidden size, 32 attention heads, context length of 4096.
OLMo 2-13B: 40 layers, 5120 hidden size, 40 attention heads, context length of 4096.

Training

The training involves two main stages:

Pretraining on OLMo-Mix-1124 and Dolmino-Mix-1124 datasets.
Post-training, using a mixture of techniques including SFT, DPO, and PPO with preference mixes.

Stage 1: Initial Pretraining

7B Model: Trained for ~1 epoch on 4 trillion tokens.
13B Model: Trained for 1.2 epochs on 5 trillion tokens.

Stage 2: Fine-Tuning

Involves several training mixes with up to 300 billion tokens, focusing on high-quality data and academic content.

Guide: Running Locally

To run OLMo 2 models locally, follow these steps:

Install the Transformers Library:

pip install --upgrade git+https://github.com/huggingface/transformers.git

Load the Model:

from transformers import AutoModelForCausalLM, AutoTokenizer

olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-2-1124-7B")

Inference Example:

message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

Quantization for Faster Performance:

olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B", 
    torch_dtype=torch.float16, 
    load_in_8bit=True)

Requires bitsandbytes package.

Cloud GPU Recommendation:
- Use cloud services like AWS, GCP, or Azure for access to GPUs if local resources are insufficient.

License

The OLMo 2 models and code are released under the Apache 2.0 License, allowing for broad usage and modification with attribution.

More Related APIs