Introduction

T0 is a series of encoder-decoder models developed by the BigScience Workshop for zero-shot task generalization using English natural language prompts. The models are smaller than GPT-3, yet perform impressively on a variety of tasks.

Architecture

T0 models are based on the T5 architecture, a Transformer-based encoder-decoder language model. Input text is processed by the encoder and target text is generated by the decoder. The models are fine-tuned with a task-specific mixture of datasets, allowing them to predict outputs for unseen tasks specified in natural language.

Training

The T0 models were trained using a multitask mixture of datasets with fine-tuning steps reaching 12,200. The training involved tasks such as multiple-choice QA, extractive QA, and sentiment analysis, among others. The models use the Adafactor optimizer with a learning rate of 1e-3 and a dropout of 0.1.

Guide: Running Locally

  1. Install Dependencies: Ensure you have PyTorch and Transformers library installed.

  2. Load Model: Use the following code to load the T0pp model:

    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    
    tokenizer = AutoTokenizer.from_pretrained("bigscience/T0pp")
    model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp")
    
    inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt")
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))
    
  3. Hardware Recommendations: Due to the large size of the model, it is recommended to use cloud GPUs like AWS EC2 instances or Google Cloud Platform for efficient inference.

License

T0 is released under the Apache 2.0 License, which allows for free use, modification, and distribution of the software.

More Related APIs in Text2text Generation