led large book summary

pszemraj

Introduction

The LED-LARGE-BOOK-SUMMARY is a fine-tuned version of the allenai/led-large-16384 model on the BookSum dataset. It is designed for summarizing long-form text in both academic and general contexts, handling up to 16,384 tokens.

Architecture

This model utilizes the Longformer Encoder-Decoder (LED) architecture, specifically fine-tuned for long document summarization. It supports input sequences up to 16,384 tokens, making it suitable for lengthy documents.

Training

The model was fine-tuned on the BookSum dataset, focusing on chapter-level inputs and summary outputs. The training spanned over 13+ epochs, with specific hyperparameter adjustments across different stages:

  • Initial Stages: Low learning rate of 5e-05, batch size of 1, and a linear scheduler.
  • Middle Stages: Learning rate adjusted to 4e-05, batch size increased to 2, using a cosine scheduler.
  • Final Stages: Learning rate further reduced to 2e-05, reverted batch size to 1.

The training utilized the Transformers library (version 4.19.2) and PyTorch (version 1.11.0+cu113).

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Dependencies:

    pip install torch transformers
    
  2. Load the Model:

    import torch
    from transformers import pipeline
    
    hf_name = 'pszemraj/led-large-book-summary'
    summarizer = pipeline(
        "summarization",
        hf_name,
        device=0 if torch.cuda.is_available() else -1,
    )
    
  3. Summarize Text:

    wall_of_text = "your words here"
    result = summarizer(
        wall_of_text,
        min_length=16,
        max_length=256,
        no_repeat_ngram_size=3,
        encoder_no_repeat_ngram_size=3,
        repetition_penalty=3.5,
        num_beams=4,
        early_stopping=True,
    )
    
  4. Consider Using Cloud GPUs:
    For optimal performance, especially with large text inputs, it is recommended to utilize cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

This model is released under the Apache 2.0 and BSD-3-Clause licenses, allowing for both commercial and non-commercial use with proper attribution.

More Related APIs in Summarization