llemma_7b LLM Model — Open LLM List

Introduction

Llemma 7B is a specialized language model designed for mathematical tasks, initiated with Code Llama 7B weights. It has been trained on the Proof-Pile-2 dataset for 200 billion tokens. A larger variant, Llemma 34B, is also available.

Architecture

Llemma models excel in mathematical reasoning and computational tasks, outperforming several other models such as Llama-2 and Minerva in chain-of-thought mathematics tasks. These models are effective in using computational tools like Python and formal theorem provers.

Training

Llemma was trained using the Proof-Pile-2 dataset, focusing on enhancing its performance in mathematical reasoning and theorem proving. The training involved processing a substantial amount of data, with 200 billion tokens used for the 7B model version.

Guide: Running Locally

Clone the Repository: Obtain the model from the Hugging Face model hub.
Setup Environment: Ensure you have the required dependencies, such as Python and PyTorch.
Load the Model: Use the Transformers library to load the model into your environment.
Inference: Run inference tasks by inputting mathematical problems to the model.

For optimal performance, consider using cloud GPUs, such as those offered by AWS or Google Cloud.

License

The Llemma model is licensed under the Llama2 license.

More Related APIs in Text Generation