Cerebras G P T 13 B LLM Model

Introduction

Cerebras-GPT-13B is part of the Cerebras-GPT family, released to facilitate research into large language model (LLM) scaling laws using open architectures and datasets. The models in this family range from 111M to 13B parameters and are designed to demonstrate the scalability of training LLMs on the Cerebras software and hardware stack.

Architecture

Cerebras-GPT models are transformer-based language models following a GPT-3 style architecture. They utilize a tokenizer based on Byte Pair Encoding with a vocabulary size of 50,257 and a sequence length of 2,048. The optimizer used is AdamW, with specific hyperparameters for different model sizes. Positional encoding is learned, and the models are trained on English language datasets.

Training

The training data for Cerebras-GPT models is sourced from the Pile dataset, which has been preprocessed and tokenized using GPT-2 vocabulary. The training follows Chinchilla scaling laws, with 20 tokens per model parameter, and utilizes Cerebras' weight streaming technology for efficient training across nodes. Training is conducted on the Andromeda AI supercomputer.

Guide: Running Locally

To run the Cerebras-GPT-13B model locally, follow these steps:

Install the Transformers Library:
```
pip install transformers
```

Load the Model:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cerebras/Cerebras-GPT-13B")
model = AutoModelForCausalLM.from_pretrained("cerebras/Cerebras-GPT-13B")

Generate Text:

from transformers import pipeline

text = "Generative AI is "
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
generated_text = pipe(text, max_length=50, do_sample=False, no_repeat_ngram_size=2)[0]
print(generated_text['generated_text'])

Using cloud GPUs is recommended for handling large models like Cerebras-GPT-13B due to the significant computational resources required.

License

Cerebras-GPT-13B is released under the Apache 2.0 license, allowing for free use and modification by the community.

More Related APIs in Text Generation