t5 small
google-t5T5-Small Model Documentation
Introduction
T5-Small is a variant of the Text-To-Text Transfer Transformer (T5) model developed by Google. It is designed to handle various NLP tasks by framing them in a unified text-to-text format. With 60 million parameters, T5-Small can perform tasks like machine translation, document summarization, question answering, and classification.
Architecture
T5-Small is a language model that uses a text-to-text framework allowing the same model to be applied across diverse tasks. This approach contrasts with models like BERT, which are limited to certain output types. T5 is capable of handling multiple languages, including English, French, Romanian, and German.
Training
T5-Small was pre-trained on the Colossal Clean Crawled Corpus (C4) using both unsupervised and supervised learning tasks. The unsupervised tasks included datasets like C4 and Wiki-DPR, while supervised tasks used datasets for sentence acceptability, sentiment analysis, paraphrasing, natural language inference, sentence completion, word sense disambiguation, and question answering. The training involved a multi-task mixture, applying a unified text-to-text format.
Guide: Running Locally
To run T5-Small locally, follow these steps:
-
Install Transformers Library: Ensure you have the
transformers
library installed via pip.pip install transformers
-
Load the Model and Tokenizer:
from transformers import T5Tokenizer, T5Model tokenizer = T5Tokenizer.from_pretrained("t5-small") model = T5Model.from_pretrained("t5-small")
-
Prepare Input Data:
input_ids = tokenizer("Studies have shown that owning a dog is good for you", return_tensors="pt").input_ids decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids
-
Perform a Forward Pass:
outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids) last_hidden_states = outputs.last_hidden_state
For enhanced performance, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure.
License
The T5-Small model is open-sourced under the Apache 2.0 License, allowing for wide use and distribution in both commercial and non-commercial applications.