Tulu Math Lingo 8 B G G U F LLM Model

Introduction

The Tulu-MathLingo-8B-GGUF is a quantized version of the Tulu-MathLingo-8B model, fine-tuned from the meta-llama/Llama-3.1-8B. This model is optimized for solving mathematical word problems and reasoning tasks in English and Tulu, featuring advanced language understanding and reasoning capabilities.

Architecture

The model is based on the Llama-3.1-8B architecture with a focus on text generation, specifically designed to handle complex math problems and reasoning tasks. It has been quantized using llama.cpp, and its weights are stored in the Safetensors format to enhance security and inference speed.

Training

Base Model: meta-llama/Llama-3.1-8B
Fine-Tuning: Utilizes Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) techniques.
Dataset: Trained on the Microsoft Orca Math Word Problems Dataset, comprising 200k word problems.
Model Size: 8.03 billion parameters, optimized for FP16 tensor type.

Guide: Running Locally

Install Transformers Library:
```
pip install transformers
```

Load the Model:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Tulu-MathLingo-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="fp16")

Run a Sample Query:

query = "If a train travels 60 miles in 2 hours, what is its average speed?"
inputs = tokenizer(query, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Answer:", response)

Hardware Requirements:
- Use a GPU with at least 24GB VRAM.
- Consider using cloud GPUs such as NVIDIA A100 for optimal performance.
Optimization:
- Implement mixed precision (fp16) to reduce memory usage.
- Split inference across multiple GPUs if necessary.

License

This model is licensed under the creativeml-openrail-m license.

More Related APIs in Text Generation