t5 large
google-t5T5-Large Model
Introduction
T5-Large is a neural network model belonging to the Text-To-Text Transfer Transformer (T5) family. It reframes all NLP tasks into a text-to-text format, allowing for consistent application across various tasks such as translation, summarization, and classification.
Architecture
T5-Large consists of 770 million parameters and is a part of Google's T5 model lineup. It is designed to handle multiple languages, including English, French, Romanian, and German. It employs a unified model architecture, loss function, and hyperparameters for diverse NLP tasks.
Training
The model is pre-trained on the Colossal Clean Crawled Corpus (C4) and uses a blend of unsupervised and supervised tasks. Unsupervised tasks involve datasets like C4 and Wiki-DPR, whereas supervised tasks span across areas such as sentiment analysis, natural language inference, and question answering using datasets like SST-2, MNLI, and BoolQ.
Guide: Running Locally
To run T5-Large locally, you can use the following steps:
-
Install the Transformers library:
pip install transformers
-
Load the model and tokenizer:
from transformers import T5Tokenizer, T5Model tokenizer = T5Tokenizer.from_pretrained("t5-large") model = T5Model.from_pretrained("t5-large")
-
Prepare input data and perform inference:
input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt").input_ids decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids) last_hidden_states = outputs.last_hidden_state
Using a cloud GPU, such as those available on Google Cloud Platform, can be beneficial for running the model efficiently.
License
T5-Large is licensed under the Apache 2.0 License, allowing for free use, modification, and distribution with proper attribution.