it5 small
gsartiIntroduction
The IT5 model family is an initiative to pretrain large-scale sequence-to-sequence transformer models specifically for the Italian language. It follows the methodology of the original T5 model and is part of the project "IT5: Text-to-Text Pretraining for Italian Language Understanding and Generation," developed by Gabriele Sarti and Malvina Nissim, with support from Hugging Face and Google's TPU Research Cloud.
Architecture
The IT5 models are based on the T5 architecture and come in several variants: it5-small, it5-base, it5-large, and it5-base-oscar. These models are pretrained on the Thoroughly Cleaned Italian mC4 Corpus, utilizing the google/t5-v1_1-small configuration for the it5-small variant. Key architectural features include gated-gelu activation functions and the Adafactor optimizer.
Training
Training was conducted on a single TPU3v8-VM machine using Google Cloud. The it5-small model was trained for one epoch, equivalent to 1,050,000 steps, over approximately 36 hours. The dataset used contains around 41 billion words (~275GB) from the Italian mC4 Corpus. The training process details are available on GitHub.
Guide: Running Locally
To use the IT5-small model locally, follow these steps:
- Install the
transformers
library:pip install transformers
- Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("gsarti/it5-small") model = AutoModelForSeq2SeqLM.from_pretrained("gsarti/it5-small")
- Fine-tune the model on your specific seq2seq task.
For enhanced performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
The IT5 model is released under the Apache-2.0 license, which allows for free use, modification, and distribution of the software.