Fine Math Llama 3 B
HuggingFaceTBIntroduction
FineMath-Llama-3B is a continual pre-training model based on Llama-3.2-3B and is optimized for mathematical tasks. It utilizes a high-quality math dataset, FineMath, combined with FineWeb-Edu, to enhance its performance in mathematical reasoning while maintaining its capabilities in language, reasoning, and common sense benchmarks.
Architecture
- Model Architecture: Llama3
- Pretraining Steps: 160k
- Pretraining Tokens: 160B
- Precision: bfloat16
Training
FineMath-Llama-3B was trained using nanotron on 64 H100 GPUs. The training involved 160 billion tokens, with a dataset composition of 40% FineWeb-Edu and 60% FineMath, incorporating subsets such as FineMath-4+ and InfiWebMath-4+. The model was evaluated using the SmolLM2 setup with lighteval.
Guide: Running Locally
To run the FineMath-Llama-3B model locally:
- Install Dependencies:
pip install -q transformers
- Load Model:
from transformers import AutoModelForCausalLM, AutoTokenizer model = "HuggingFaceTB/FineMath-Llama-3B" device = "cuda" # for GPU usage or "cpu" for CPU usage tokenizer = AutoTokenizer.from_pretrained(model) model = AutoModelForCausalLM.from_pretrained(model).to(device) inputs = tokenizer.encode("Machine Learning is", return_tensors="pt").to(device) outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
- Hardware Suggestion: For optimal performance, use a cloud GPU service such as AWS, Google Cloud, or Azure with access to GPUs like NVIDIA Tesla V100 or A100.
License
The FineMath-Llama-3B model is licensed under the Apache-2.0 license, allowing for wide use and distribution with minimal restrictions.