bart large finetuned filtered spotify podcast summ
gmurroIntroduction
The BART-LARGE-FINETUNED-FILTERED-SPOTIFY-PODCAST-SUMM
model is a fine-tuned version of facebook/bart-large-cnn
, designed for automatic podcast summarization using the Spotify Podcast Dataset. It aims to provide concise, human-readable summaries of podcast transcripts for quick viewing on devices like smartphones.
Architecture
The model is built upon the BART architecture, specifically the facebook/bart-large-cnn
variant. It utilizes a combination of extractive and abstractive summarization techniques to generate summaries from podcast transcripts. The extractive module identifies key segments, which are then processed by the abstractive summarizer.
Training
The training dataset comprises 69,336 episodes, with a validation set of 7,705 episodes. The test set includes 1,027 episodes, of which 1,025 were used. The model was trained with the AdamWeightDecay optimizer, using a learning rate of 2e-05. It achieved a train loss of 2.2967 and a validation loss of 2.8316 over two epochs. The framework versions used include Transformers 4.19.4, TensorFlow 2.9.1, Datasets 2.3.1, and Tokenizers 0.12.1.
Guide: Running Locally
To use the model for summarization, follow these steps:
-
Install Required Libraries: Ensure you have the
transformers
library installed. You can do this via pip:pip install transformers
-
Load the Model:
from transformers import pipeline summarizer = pipeline("summarization", model="gmurro/bart-large-finetuned-filtered-spotify-podcast-summ", tokenizer="gmurro/bart-large-finetuned-filtered-spotify-podcast-summ")
-
Summarize a Transcript:
summary = summarizer(podcast_transcript, min_length=39, max_length=250) print(summary[0]['summary_text'])
-
Consider Using Cloud GPUs: For faster processing, consider using cloud services such as AWS, Google Cloud, or Azure, which offer GPU instances.
License
The model is released under the MIT License, allowing for broad usage and modification.