long t5 tglobal base 16384 book summary
pszemrajIntroduction
The long-t5-tglobal-base-16384-book-summary
is a fine-tuned version of the google/long-t5-tglobal-base
model, designed to summarize lengthy documents, such as books and academic papers. It is optimized for processing long sequences and generating concise summaries, offering a scalable solution for text analysis tasks.
Architecture
The model is based on the LongT5 architecture, which is a transformer model optimized for handling long sequences efficiently. This is achieved through a global attention mechanism in contrast to the standard transformer architecture. The model has been fine-tuned on the kmfoda/booksum
dataset, utilizing powerful V100/A100 GPUs for training.
Training
The model underwent over 30 epochs of fine-tuning from the base model using V100/A100 GPUs. The dataset used was kmfoda/booksum
, with input sequences of up to 16,384 tokens and output sequences capped at 1,024 tokens. Training adjustments included filtering out summaries exceeding the maximum token length to prevent partial summary generation.
Guide: Running Locally
-
Install Dependencies: Ensure you have Python installed along with the necessary libraries. Install the
transformers
library using:pip install -U transformers
-
Load the Model:
import torch from transformers import pipeline summarizer = pipeline( "summarization", "pszemraj/long-t5-tglobal-base-16384-book-summary", device=0 if torch.cuda.is_available() else -1, )
-
Summarize Text:
long_text = "Your long document text here." result = summarizer(long_text) print(result[0]["summary_text"])
-
Cloud GPUs: For large-scale processing, consider using cloud GPU services such as AWS, GCP, or Azure to handle the computational demands efficiently.
License
The model is released under the Apache 2.0 and BSD 3-Clause licenses, allowing for both commercial and non-commercial use with appropriate attribution.