pythia 70m
EleutherAIIntroduction
The Pythia Scaling Suite, developed by EleutherAI, is a collection of models designed to facilitate research into the interpretability of large language models. This suite includes models of varying sizes trained on the Pile dataset, with both deduplicated and non-deduplicated versions available.
Architecture
Pythia models are based on the transformer architecture and are implemented using the GPT-NeoX library. These models are available in various sizes, including 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B parameters. Each model in the suite uses the same hyperparameters and training data, ensuring consistency across different model sizes.
Training
Pythia models are trained on the Pile, an 825GiB dataset comprising diverse text sources. Training involves processing 299,892,736,000 tokens with checkpoints saved at regular intervals. The models are trained for 143,000 steps at a batch size of 2M tokens. Training details, including hyperparameters and procedures, are available in the Pythia GitHub repository.
Guide: Running Locally
To run Pythia models locally, follow these steps:
- Install the Transformers Library: Ensure you have the
transformers
library installed in your Python environment. - Load the Model and Tokenizer: Use the
GPTNeoXForCausalLM
andAutoTokenizer
classes from thetransformers
library to load the model and tokenizer. - Generate Text: Input text can be tokenized and passed to the model to generate predictions.
from transformers import GPTNeoXForCausalLM, AutoTokenizer
model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/pythia-70m-deduped", revision="step3000", cache_dir="./pythia-70m-deduped/step3000")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m-deduped", revision="step3000", cache_dir="./pythia-70m-deduped/step3000")
inputs = tokenizer("Hello, I am", return_tensors="pt")
tokens = model.generate(**inputs)
print(tokenizer.decode(tokens[0]))
For optimal performance, consider using cloud GPUs available on platforms like AWS, Google Cloud, or Azure.
License
The Pythia models are distributed under the Apache 2.0 license. This allows for both academic and commercial use, provided compliance with the terms of the license, which includes proper attribution and a disclaimer of warranties.