T0pp
bigscienceIntroduction
T0pp is a model developed by the BigScience Workshop, designed for zero-shot task generalization in natural language processing. It outperforms GPT-3 on many tasks despite being 16 times smaller. The model is based on the T5 architecture and fine-tuned on a mixture of datasets that cover a wide range of NLP tasks.
Architecture
T0pp is an encoder-decoder model based on the T5 architecture, with 11 billion parameters. It processes input text through an encoder and generates target text through a decoder. It is fine-tuned to generate target sequences autoregressively using maximum likelihood training.
Training
The T0pp model is fine-tuned from the T5 language model on a multitask mixture of datasets. Training involves 12,200 fine-tuning steps with an input sequence length of 1024 tokens and a target sequence length of 256 tokens. The training uses the Adafactor optimizer with a learning rate of 1e-3 and a dropout rate of 0.1. The model is trained with bf16 activations and is evaluated on a wide range of tasks to assess its performance and bias.
Guide: Running Locally
-
Install Dependencies: Make sure to have PyTorch and the Hugging Face Transformers library installed.
pip install torch transformers
-
Load the Model: Use the following code snippet to load and run the model.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("bigscience/T0pp") model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp") inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
-
Cloud GPU Recommendation: Due to its size, running T0pp efficiently might require cloud GPUs such as those provided by AWS, GCP, or Azure.
License
The T0pp model is distributed under the Apache 2.0 license, allowing for both commercial and non-commercial use.