Fine Math Llama 3 B LLM Model

Introduction

FineMath-Llama-3B is a continual pre-training model based on Llama-3.2-3B and is optimized for mathematical tasks. It utilizes a high-quality math dataset, FineMath, combined with FineWeb-Edu, to enhance its performance in mathematical reasoning while maintaining its capabilities in language, reasoning, and common sense benchmarks.

Architecture

Model Architecture: Llama3
Pretraining Steps: 160k
Pretraining Tokens: 160B
Precision: bfloat16

Training

FineMath-Llama-3B was trained using nanotron on 64 H100 GPUs. The training involved 160 billion tokens, with a dataset composition of 40% FineWeb-Edu and 60% FineMath, incorporating subsets such as FineMath-4+ and InfiWebMath-4+. The model was evaluated using the SmolLM2 setup with lighteval.

Guide: Running Locally

To run the FineMath-Llama-3B model locally:

Install Dependencies:
```
pip install -q transformers
```

Load Model:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = "HuggingFaceTB/FineMath-Llama-3B"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model).to(device)

inputs = tokenizer.encode("Machine Learning is", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Hardware Suggestion: For optimal performance, use a cloud GPU service such as AWS, Google Cloud, or Azure with access to GPUs like NVIDIA Tesla V100 or A100.

License

The FineMath-Llama-3B model is licensed under the Apache-2.0 license, allowing for wide use and distribution with minimal restrictions.

More Related APIs