t5 base qa squad v1.1 portuguese LLM Model

Introduction

The t5-base-qa-squad-v1.1-portuguese model is a Question Answering (QA) model in Portuguese, finetuned on January 27, 2022, using Google Colab. It is based on the unicamp-dl/ptt5-base-portuguese-vocab model and trained with the SQUAD v1.1 dataset in Portuguese from the Deep Learning Brasil group. The model aims to perform Text2Text Generation tasks with a particular focus on QA. Due to the limited size of the T5 base model and its finetuning dataset, overfitting occurred before the completion of training. The final metrics on the validation dataset are an F1 score of 79.3 and an exact match of 67.3983.

Architecture

The model is built with the T5 architecture, which is designed for text-to-text tasks. It leverages PyTorch and the Transformers library for implementation. The model has been adapted to work specifically on the SQUAD v1.1 dataset translated into Portuguese, focusing on QA tasks.

Training

The model was trained using a batch size of 4, with gradient accumulation steps of 3, a learning rate of 1e-4, and weight decay of 0.01 over 10 epochs. The training process involved logging steps every 3000 iterations, saving checkpoints at the same interval, and using the F1 metric to determine the best model. The training dataset consisted of 87,510 examples, and the training resulted in a total of 72,920 optimization steps.

Key training results after various steps are as follows:

Step 3000: F1 = 75.11, Exact Match = 61.81
Step 18000: F1 = 79.29, Exact Match = 67.58
Step 27000: F1 = 79.33, Exact Match = 66.97

Guide: Running Locally

To run the model locally, follow these steps:

Install PyTorch: Visit PyTorch's official site for installation instructions.
Install Transformers: Use the command !pip install transformers.

Load Model and Tokenizer:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "pierreguillou/t5-base-qa-squad-v1.1-portuguese"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Perform Inference:

input_text = 'question: Quando foi descoberta a Covid-19? context: A pandemia...'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs["input_ids"], max_length=32, num_beams=1, early_stopping=True)
pred = tokenizer.decode(outputs[0], skip_special_tokens=True)
print('Prediction:', pred)

For enhanced performance, leveraging cloud GPUs such as those available from AWS, GCP, or Azure is recommended.

License

This model is available under the terms specified by its creator and the hosting platform. Specific licensing details can be found on the model's Hugging Face page.

More Related APIs in Text2text Generation