Introduction

The Pythia Scaling Suite, developed by EleutherAI, is a collection of models designed to facilitate research into the interpretability of large language models. This suite includes models of varying sizes trained on the Pile dataset, with both deduplicated and non-deduplicated versions available.

Architecture

Pythia models are based on the transformer architecture and are implemented using the GPT-NeoX library. These models are available in various sizes, including 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B parameters. Each model in the suite uses the same hyperparameters and training data, ensuring consistency across different model sizes.

Training

Pythia models are trained on the Pile, an 825GiB dataset comprising diverse text sources. Training involves processing 299,892,736,000 tokens with checkpoints saved at regular intervals. The models are trained for 143,000 steps at a batch size of 2M tokens. Training details, including hyperparameters and procedures, are available in the Pythia GitHub repository.

Guide: Running Locally

To run Pythia models locally, follow these steps:

  1. Install the Transformers Library: Ensure you have the transformers library installed in your Python environment.
  2. Load the Model and Tokenizer: Use the GPTNeoXForCausalLM and AutoTokenizer classes from the transformers library to load the model and tokenizer.
  3. Generate Text: Input text can be tokenized and passed to the model to generate predictions.
from transformers import GPTNeoXForCausalLM, AutoTokenizer

model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/pythia-70m-deduped", revision="step3000", cache_dir="./pythia-70m-deduped/step3000")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m-deduped", revision="step3000", cache_dir="./pythia-70m-deduped/step3000")

inputs = tokenizer("Hello, I am", return_tensors="pt")
tokens = model.generate(**inputs)
print(tokenizer.decode(tokens[0]))

For optimal performance, consider using cloud GPUs available on platforms like AWS, Google Cloud, or Azure.

License

The Pythia models are distributed under the Apache 2.0 license. This allows for both academic and commercial use, provided compliance with the terms of the license, which includes proper attribution and a disclaimer of warranties.

More Related APIs