t5 v1_1 small
googleIntroduction
T5 Version 1.1 is an improved version of Google's Text-to-Text Transfer Transformer (T5), designed to enhance transfer learning in natural language processing (NLP). This model reformulates various language tasks into a text-to-text format, facilitating fine-tuning on specific tasks after pre-training on a large corpus.
Architecture
T5 Version 1.1 introduces several architectural improvements over the original T5 model:
- GEGLU Activation: Utilizes GEGLU activation in the feed-forward hidden layer instead of ReLU.
- Dropout: Disabled during pre-training for quality improvement, but should be re-enabled during fine-tuning.
- Dataset: Pre-trained exclusively on the C4 dataset without mixing with downstream tasks.
- No Parameter Sharing: Embedding and classifier layers do not share parameters.
- Model Variants: Introduces "xl" and "xxl" to replace previous "3B" and "11B" models, with larger
d_model
and smallernum_heads
andd_ff
.
Training
The model was pre-trained using the C4 dataset, focusing solely on unsupervised learning. This requires subsequent fine-tuning on downstream tasks to achieve optimal performance.
Guide: Running Locally
To run the T5 Version 1.1 model locally, follow these steps:
- Install Dependencies: Ensure you have Python and the required libraries such as Transformers, PyTorch, TensorFlow, or JAX installed.
- Download the Model: Use the Hugging Face Model Hub to download T5 Version 1.1.
- Load the Model: Load the model into your environment using the appropriate library (e.g.,
transformers
for PyTorch). - Fine-tune the Model: Enable dropout and fine-tune the model on your specific task dataset.
- Inference: Once fine-tuned, you can perform inference on your text data.
For training and inference, cloud GPUs from providers like AWS, Google Cloud, or Azure can be beneficial for handling large datasets and model sizes efficiently.
License
T5 Version 1.1 is released under the Apache 2.0 License, permitting usage, distribution, and modification under the terms of the license.