mt5 base wikinewssum portuguese
airKlizzIntroduction
The MT5-BASE-WIKINEWSSUM-PORTUGUESE model is a fine-tuned version of Google's mT5-base, optimized for summarization tasks in Portuguese. The model has been trained to generate concise summaries from text inputs, and it achieves various ROUGE scores indicating its summarization performance.
Architecture
The model is based on the mT5 architecture, which is a multilingual variant of the T5 (Text-to-Text Transfer Transformer) model. This architecture allows for text-to-text generation tasks and is particularly suited for multilingual applications, leveraging a transformer-based framework.
Training
Training Procedure
The model was trained with the following hyperparameters:
- Learning Rate: 5.6e-05
- Training Batch Size: 4
- Evaluation Batch Size: 4
- Seed: 42
- Gradient Accumulation Steps: 2
- Total Train Batch Size: 8
- Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
- LR Scheduler Type: Linear
- Number of Epochs: 8
Training Results
Training metrics were recorded across 8 epochs, with final results as follows:
- Loss: 2.0428
- Rouge1: 9.4966
- Rouge2: 4.2224
- Rougel: 7.9845
- Rougelsum: 8.8641
Guide: Running Locally
-
Setup Environment: Ensure you have the required framework versions installed:
- Transformers 4.13.0
- PyTorch 1.10.1
- Datasets 1.16.1
- Tokenizers 0.10.3
-
Install Dependencies: Use pip to install necessary libraries:
pip install transformers==4.13.0 torch==1.10.1 datasets==1.16.1 tokenizers==0.10.3
-
Load the Model: Use the Hugging Face Transformers library to load the model:
from transformers import MT5ForConditionalGeneration, MT5Tokenizer model = MT5ForConditionalGeneration.from_pretrained("airKlizz/mt5-base-wikinewssum-portuguese") tokenizer = MT5Tokenizer.from_pretrained("airKlizz/mt5-base-wikinewssum-portuguese")
-
Run Inference: Input your text and generate summaries:
input_text = "Your text to summarize here" input_ids = tokenizer.encode(input_text, return_tensors="pt") summary_ids = model.generate(input_ids) summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
-
Suggest Cloud GPUs: For improved performance, consider using cloud-based GPU services like AWS EC2, Google Cloud's AI Platform, or Azure Machine Learning.
License
The model is released under the Apache-2.0 license, which allows for both personal and commercial use, distribution, modification, and private use, provided that the license terms are followed.