llemma_7b
EleutherAIIntroduction
Llemma 7B is a specialized language model designed for mathematical tasks, initiated with Code Llama 7B weights. It has been trained on the Proof-Pile-2 dataset for 200 billion tokens. A larger variant, Llemma 34B, is also available.
Architecture
Llemma models excel in mathematical reasoning and computational tasks, outperforming several other models such as Llama-2 and Minerva in chain-of-thought mathematics tasks. These models are effective in using computational tools like Python and formal theorem provers.
Training
Llemma was trained using the Proof-Pile-2 dataset, focusing on enhancing its performance in mathematical reasoning and theorem proving. The training involved processing a substantial amount of data, with 200 billion tokens used for the 7B model version.
Guide: Running Locally
- Clone the Repository: Obtain the model from the Hugging Face model hub.
- Setup Environment: Ensure you have the required dependencies, such as Python and PyTorch.
- Load the Model: Use the Transformers library to load the model into your environment.
- Inference: Run inference tasks by inputting mathematical problems to the model.
For optimal performance, consider using cloud GPUs, such as those offered by AWS or Google Cloud.
License
The Llemma model is licensed under the Llama2 license.