Fine Math Llama 3 B

HuggingFaceTB

Introduction

FineMath-Llama-3B is a continual pre-training model based on Llama-3.2-3B and is optimized for mathematical tasks. It utilizes a high-quality math dataset, FineMath, combined with FineWeb-Edu, to enhance its performance in mathematical reasoning while maintaining its capabilities in language, reasoning, and common sense benchmarks.

Architecture

  • Model Architecture: Llama3
  • Pretraining Steps: 160k
  • Pretraining Tokens: 160B
  • Precision: bfloat16

Training

FineMath-Llama-3B was trained using nanotron on 64 H100 GPUs. The training involved 160 billion tokens, with a dataset composition of 40% FineWeb-Edu and 60% FineMath, incorporating subsets such as FineMath-4+ and InfiWebMath-4+. The model was evaluated using the SmolLM2 setup with lighteval.

Guide: Running Locally

To run the FineMath-Llama-3B model locally:

  1. Install Dependencies:
    pip install -q transformers
    
  2. Load Model:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model = "HuggingFaceTB/FineMath-Llama-3B"
    device = "cuda" # for GPU usage or "cpu" for CPU usage
    
    tokenizer = AutoTokenizer.from_pretrained(model)
    model = AutoModelForCausalLM.from_pretrained(model).to(device)
    
    inputs = tokenizer.encode("Machine Learning is", return_tensors="pt").to(device)
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))
    
  3. Hardware Suggestion: For optimal performance, use a cloud GPU service such as AWS, Google Cloud, or Azure with access to GPUs like NVIDIA Tesla V100 or A100.

License

The FineMath-Llama-3B model is licensed under the Apache-2.0 license, allowing for wide use and distribution with minimal restrictions.

More Related APIs