T0_3B

Introduction

T0_3B is a part of the T0* series of encoder-decoder models designed for zero-shot task generalization using English natural language prompts. It outperforms larger models like GPT-3 on many tasks despite being smaller in size (3 billion parameters). The T0_3B model is fine-tuned on a diverse set of tasks to perform well on unseen tasks specified in natural language.

Architecture

T0_3B is based on the T5 architecture, which is a Transformer-based encoder-decoder model. It is pre-trained on the C4 dataset and fine-tuned with a multitask mixture that includes various NLP tasks. The model utilizes bf16 activations, and it's recommended to use it with fp32 or bf16 precision for inference.

Training

The T0* models are fine-tuned from pre-trained T5 checkpoints using a mixture of datasets converted into natural language prompts. The training involved 12,200 fine-tuning steps with a batch size of 1,024 sequences. The datasets used cover multiple NLP tasks such as QA, summarization, sentiment analysis, and more. The T0_3B model specifically starts from a T5-LM XL pre-trained checkpoint with 3 billion parameters.

Guide: Running Locally

Setup Environment: Install PyTorch and Hugging Face's Transformers library.

Load Model:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("bigscience/T0pp")
model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp")

inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Choose Model Checkpoint: Replace the path in AutoTokenizer and AutoModelForSeq2SeqLM if using a different checkpoint.
Hardware Recommendation: Due to the model's size, using a cloud GPU service like AWS, Google Cloud, or Azure is advisable.

License

The T0_3B model is released under the Apache 2.0 License, allowing for wide use and modification with proper attribution.