led large book summary
pszemrajIntroduction
The LED-LARGE-BOOK-SUMMARY
is a fine-tuned version of the allenai/led-large-16384
model on the BookSum dataset. It is designed for summarizing long-form text in both academic and general contexts, handling up to 16,384 tokens.
Architecture
This model utilizes the Longformer Encoder-Decoder (LED) architecture, specifically fine-tuned for long document summarization. It supports input sequences up to 16,384 tokens, making it suitable for lengthy documents.
Training
The model was fine-tuned on the BookSum dataset, focusing on chapter-level inputs and summary outputs. The training spanned over 13+ epochs, with specific hyperparameter adjustments across different stages:
- Initial Stages: Low learning rate of 5e-05, batch size of 1, and a linear scheduler.
- Middle Stages: Learning rate adjusted to 4e-05, batch size increased to 2, using a cosine scheduler.
- Final Stages: Learning rate further reduced to 2e-05, reverted batch size to 1.
The training utilized the Transformers library (version 4.19.2) and PyTorch (version 1.11.0+cu113).
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Dependencies:
pip install torch transformers
-
Load the Model:
import torch from transformers import pipeline hf_name = 'pszemraj/led-large-book-summary' summarizer = pipeline( "summarization", hf_name, device=0 if torch.cuda.is_available() else -1, )
-
Summarize Text:
wall_of_text = "your words here" result = summarizer( wall_of_text, min_length=16, max_length=256, no_repeat_ngram_size=3, encoder_no_repeat_ngram_size=3, repetition_penalty=3.5, num_beams=4, early_stopping=True, )
-
Consider Using Cloud GPUs:
For optimal performance, especially with large text inputs, it is recommended to utilize cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
This model is released under the Apache 2.0 and BSD-3-Clause licenses, allowing for both commercial and non-commercial use with proper attribution.