flan t5 base samsum
philschmidIntroduction
The FLAN-T5-BASE-SAMSUM model is a fine-tuned version of Google's FLAN-T5-BASE, specifically trained on the Samsum dataset for text-to-text generation tasks. It is designed to perform sequence-to-sequence language modeling and achieves notable evaluation results in the Rouge metrics.
Architecture
FLAN-T5-BASE-SAMSUM is built on top of the T5 architecture and is fine-tuned to enhance its performance on dialogue summarization tasks using the Samsum dataset. The model integrates with the Hugging Face Transformers library and is compatible with PyTorch.
Training
The model was trained using the Samsum dataset with the following hyperparameters:
- Learning Rate: 5e-05
- Train Batch Size: 8
- Eval Batch Size: 8
- Seed: 42
- Optimizer: Adam with betas (0.9, 0.999) and epsilon 1e-08
- LR Scheduler Type: Linear
- Number of Epochs: 5
Training Results
- Loss: 1.3716
- Rouge1: 47.2358
- Rouge2: 23.5135
- Rougel: 39.6266
- Rougelsum: 43.3458
- Gen Len: 17.3907
Guide: Running Locally
- Setup Environment: Ensure you have Python installed. Set up a virtual environment and install necessary libraries.
- Install Dependencies: Use pip to install the Transformers, Datasets, and PyTorch libraries:
pip install transformers datasets torch
- Download the Model: Use the Hugging Face Transformers library to download the model:
from transformers import T5ForConditionalGeneration, T5Tokenizer model = T5ForConditionalGeneration.from_pretrained("philschmid/flan-t5-base-samsum") tokenizer = T5Tokenizer.from_pretrained("philschmid/flan-t5-base-samsum")
- Run Inference: Use the model and tokenizer to generate summaries from input text.
- Cloud GPU: For faster processing, consider using cloud GPU services such as AWS, GCP, or Azure.
License
The FLAN-T5-BASE-SAMSUM model is licensed under the Apache-2.0 License. This license permits use, distribution, and modification under certain conditions, further details of which can be found in the LICENSE file of the repository.