cnn_dailymail summarization t5 small 2022 09 05
farleyknightIntroduction
The CNN_DAILYMAIL-SUMMARIZATION-T5-SMALL-2022-09-05 model is a fine-tuned version of t5-small
on the CNN/DailyMail dataset version 3.0.0. It is designed for text summarization tasks and achieves notable results in terms of ROUGE metrics.
Architecture
The model is based on the T5
architecture, which is a text-to-text transformers model. It utilizes the PyTorch library and has been generated using Hugging Face's Trainer.
Training
The model was trained using the following hyperparameters:
- Learning Rate: 5e-05
- Train Batch Size: 8
- Eval Batch Size: 8
- Seed: 42
- Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- LR Scheduler Type: Linear
- Number of Epochs: 3.0
The evaluation results on the dataset show:
- Loss: 1.6455
- Rouge1: 41.4235
- Rouge2: 19.0263
- Rougel: 29.2892
- Rougelsum: 38.6338
- Gen Len: 73.7273
Guide: Running Locally
To run the model locally, follow these steps:
- Clone the repository and navigate to the main directory.
- Install the necessary Python packages using pip:
pip install transformers torch datasets
- Load the model using the
transformers
library:from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("t5-small") model = T5ForConditionalGeneration.from_pretrained("farleyknight/cnn_dailymail-summarization-t5-small-2022-09-05")
- Use the tokenizer and model for summarization tasks.
For enhanced performance, consider using cloud GPUs such as AWS EC2 instances with NVIDIA GPUs, Google Cloud Platform, or Azure.
License
This model is licensed under the Apache 2.0 License.