distilbart cnn 12 6
sshleiferIntroduction
DistilBART-CNN-12-6 is a transformer-based model from Hugging Face designed for text summarization. It provides a balance between performance and inference time, making it suitable for efficient text generation tasks. The model is pre-trained and available in popular machine learning frameworks such as PyTorch and JAX.
Architecture
DistilBART is a distilled version of the BART model, reducing its size while maintaining high performance. This model specifically caters to text summarization tasks, leveraging datasets like CNN/DailyMail and XSum. The architecture optimizes for speed and efficiency, providing a significant reduction in model parameters and inference time compared to its larger counterparts.
Training
The DistilBART model is pre-trained on large datasets including CNN/DailyMail and XSum, which are well-suited for summarization tasks. The training process involves distilling knowledge from larger BART models to create a smaller, faster version with competitive performance metrics. Key performance metrics include Rouge-2 and Rouge-L scores, which measure the quality of summaries generated by the model.
Guide: Running Locally
-
Setup Environment:
- Install Python and necessary libraries.
- Use the command:
pip install transformers torch
.
-
Load the Model:
- Utilize Hugging Face's
transformers
library to load the model:from transformers import BartForConditionalGeneration model = BartForConditionalGeneration.from_pretrained('sshleifer/distilbart-cnn-12-6')
- Utilize Hugging Face's
-
Inference:
- Prepare your input data and run inference using the model's generation capabilities.
-
Hardware Suggestions:
- For optimal performance, consider using cloud GPUs like those available from AWS, Google Cloud, or Azure, which can significantly speed up inference time.
License
The DistilBART-CNN-12-6 model is distributed under the Apache 2.0 license. This allows for both personal and commercial use, modification, and distribution of the model.