mt5 base finetuned Spanish
eslamxmIntroduction
The MT5-BASE-FINETUNED-SPANISH model is a fine-tuned variant of Google's MT5-Base, specifically adapted for the Spanish language using the wiki_lingua dataset. It is designed for abstractive summarization tasks with results measured by metrics such as Rouge scores and BERTScore.
Architecture
The MT5 model is part of the T5 family of models, which stands for "Text-To-Text Transfer Transformer." It is capable of performing a variety of NLP tasks by converting all tasks into a text-to-text format. The MT5-Base architecture has been fine-tuned for Spanish text summarization.
Training
Training Hyperparameters
- Learning Rate: 0.0005
- Train Batch Size: 4
- Eval Batch Size: 4
- Seed: 42
- Gradient Accumulation Steps: 8
- Total Train Batch Size: 32
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- LR Scheduler Type: Linear
- LR Scheduler Warmup Steps: 250
- Number of Epochs: 5
- Label Smoothing Factor: 0.1
Framework Versions
- Transformers: 4.19.4
- PyTorch: 1.11.0+cu113
- Datasets: 2.3.0
- Tokenizers: 0.12.1
Guide: Running Locally
- Clone Repository: Start by cloning the model repository from Hugging Face Hub.
- Set Up Environment: Ensure you have the correct versions of PyTorch and Transformers installed.
- Load Model: Use the Hugging Face Transformers library to load the model.
- Data Preparation: Prepare your dataset in a format compatible with the model's input requirements.
- Inference: Run the model for inference to generate summaries from text inputs.
Suggested Cloud GPUs
For optimal performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure, which provide flexibility and scalability for handling larger datasets and more extensive model training.
License
The MT5-BASE-FINETUNED-SPANISH model is licensed under the Apache-2.0 License, allowing for both personal and commercial use, modification, and distribution.