legal pegasus
nsi319Introduction
legal-pegasus
is a fine-tuned version of Google's PEGASUS model, specifically adapted for legal document summarization. It performs abstractive summarization with a focus on legal texts, allowing for the effective summarization of complex legal documents.
Architecture
The legal-pegasus
model is built upon PEGASUS, a Transformer-based model optimized for text summarization. It handles input sequences with a maximum length of 1024 tokens, making it suitable for processing extensive legal documents.
Training
The model is trained using the SEC's litigation releases and complaints dataset, which includes over 2700 documents. This dataset provides a robust foundation for the model to learn the nuances of legal language and summarization.
Guide: Running Locally
-
Install the Transformers library: Ensure you have the
transformers
library installed, which can be done using pip:pip install transformers
-
Load the Model and Tokenizer: Use the following Python code to load the tokenizer and model:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("nsi319/legal-pegasus") model = AutoModelForSeq2SeqLM.from_pretrained("nsi319/legal-pegasus")
-
Prepare and Tokenize Text: Input your legal text and tokenize it:
text = """Your legal document text here.""" input_tokenized = tokenizer.encode(text, return_tensors='pt', max_length=1024, truncation=True)
-
Generate Summary: Generate a summary using the model:
summary_ids = model.generate(input_tokenized, num_beams=9, no_repeat_ngram_size=3, length_penalty=2.0, min_length=150, max_length=250, early_stopping=True) summary = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids][0]
-
Cloud GPUs: For more efficient processing, especially with large texts, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
The legal-pegasus
model is licensed under the MIT License, allowing for broad use and modification with minimal restrictions.