T0_3 B
bigscienceT0_3B
Introduction
T0_3B is a part of the T0* series of encoder-decoder models designed for zero-shot task generalization using English natural language prompts. It outperforms larger models like GPT-3 on many tasks despite being smaller in size (3 billion parameters). The T0_3B model is fine-tuned on a diverse set of tasks to perform well on unseen tasks specified in natural language.
Architecture
T0_3B is based on the T5 architecture, which is a Transformer-based encoder-decoder model. It is pre-trained on the C4 dataset and fine-tuned with a multitask mixture that includes various NLP tasks. The model utilizes bf16 activations, and it's recommended to use it with fp32 or bf16 precision for inference.
Training
The T0* models are fine-tuned from pre-trained T5 checkpoints using a mixture of datasets converted into natural language prompts. The training involved 12,200 fine-tuning steps with a batch size of 1,024 sequences. The datasets used cover multiple NLP tasks such as QA, summarization, sentiment analysis, and more. The T0_3B model specifically starts from a T5-LM XL pre-trained checkpoint with 3 billion parameters.
Guide: Running Locally
- Setup Environment: Install PyTorch and Hugging Face's Transformers library.
- Load Model:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("bigscience/T0pp") model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp") inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
- Choose Model Checkpoint: Replace the path in
AutoTokenizer
andAutoModelForSeq2SeqLM
if using a different checkpoint. - Hardware Recommendation: Due to the model's size, using a cloud GPU service like AWS, Google Cloud, or Azure is advisable.
License
The T0_3B model is released under the Apache 2.0 License, allowing for wide use and modification with proper attribution.