t5 base qa squad v1.1 portuguese
pierreguillouIntroduction
The t5-base-qa-squad-v1.1-portuguese
model is a Question Answering (QA) model in Portuguese, finetuned on January 27, 2022, using Google Colab. It is based on the unicamp-dl/ptt5-base-portuguese-vocab
model and trained with the SQUAD v1.1 dataset in Portuguese from the Deep Learning Brasil group. The model aims to perform Text2Text Generation tasks with a particular focus on QA. Due to the limited size of the T5 base model and its finetuning dataset, overfitting occurred before the completion of training. The final metrics on the validation dataset are an F1 score of 79.3 and an exact match of 67.3983.
Architecture
The model is built with the T5 architecture, which is designed for text-to-text tasks. It leverages PyTorch and the Transformers library for implementation. The model has been adapted to work specifically on the SQUAD v1.1 dataset translated into Portuguese, focusing on QA tasks.
Training
The model was trained using a batch size of 4, with gradient accumulation steps of 3, a learning rate of 1e-4, and weight decay of 0.01 over 10 epochs. The training process involved logging steps every 3000 iterations, saving checkpoints at the same interval, and using the F1 metric to determine the best model. The training dataset consisted of 87,510 examples, and the training resulted in a total of 72,920 optimization steps.
Key training results after various steps are as follows:
- Step 3000: F1 = 75.11, Exact Match = 61.81
- Step 18000: F1 = 79.29, Exact Match = 67.58
- Step 27000: F1 = 79.33, Exact Match = 66.97
Guide: Running Locally
To run the model locally, follow these steps:
- Install PyTorch: Visit PyTorch's official site for installation instructions.
- Install Transformers: Use the command
!pip install transformers
. - Load Model and Tokenizer:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM model_name = "pierreguillou/t5-base-qa-squad-v1.1-portuguese" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
- Perform Inference:
input_text = 'question: Quando foi descoberta a Covid-19? context: A pandemia...' inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(inputs["input_ids"], max_length=32, num_beams=1, early_stopping=True) pred = tokenizer.decode(outputs[0], skip_special_tokens=True) print('Prediction:', pred)
For enhanced performance, leveraging cloud GPUs such as those available from AWS, GCP, or Azure is recommended.
License
This model is available under the terms specified by its creator and the hosting platform. Specific licensing details can be found on the model's Hugging Face page.