Introduction

The IT5 model family is an initiative to pretrain large-scale sequence-to-sequence transformer models specifically for the Italian language. It follows the methodology of the original T5 model and is part of the project "IT5: Text-to-Text Pretraining for Italian Language Understanding and Generation," developed by Gabriele Sarti and Malvina Nissim, with support from Hugging Face and Google's TPU Research Cloud.

Architecture

The IT5 models are based on the T5 architecture and come in several variants: it5-small, it5-base, it5-large, and it5-base-oscar. These models are pretrained on the Thoroughly Cleaned Italian mC4 Corpus, utilizing the google/t5-v1_1-small configuration for the it5-small variant. Key architectural features include gated-gelu activation functions and the Adafactor optimizer.

Training

Training was conducted on a single TPU3v8-VM machine using Google Cloud. The it5-small model was trained for one epoch, equivalent to 1,050,000 steps, over approximately 36 hours. The dataset used contains around 41 billion words (~275GB) from the Italian mC4 Corpus. The training process details are available on GitHub.

Guide: Running Locally

To use the IT5-small model locally, follow these steps:

  1. Install the transformers library:
    pip install transformers
    
  2. Load the model and tokenizer:
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    
    tokenizer = AutoTokenizer.from_pretrained("gsarti/it5-small")
    model = AutoModelForSeq2SeqLM.from_pretrained("gsarti/it5-small")
    
  3. Fine-tune the model on your specific seq2seq task.

For enhanced performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

The IT5 model is released under the Apache-2.0 license, which allows for free use, modification, and distribution of the software.

More Related APIs in Text2text Generation