legal pegasus

nsi319

Introduction

legal-pegasus is a fine-tuned version of Google's PEGASUS model, specifically adapted for legal document summarization. It performs abstractive summarization with a focus on legal texts, allowing for the effective summarization of complex legal documents.

Architecture

The legal-pegasus model is built upon PEGASUS, a Transformer-based model optimized for text summarization. It handles input sequences with a maximum length of 1024 tokens, making it suitable for processing extensive legal documents.

Training

The model is trained using the SEC's litigation releases and complaints dataset, which includes over 2700 documents. This dataset provides a robust foundation for the model to learn the nuances of legal language and summarization.

Guide: Running Locally

  1. Install the Transformers library: Ensure you have the transformers library installed, which can be done using pip:

    pip install transformers
    
  2. Load the Model and Tokenizer: Use the following Python code to load the tokenizer and model:

    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    
    tokenizer = AutoTokenizer.from_pretrained("nsi319/legal-pegasus")  
    model = AutoModelForSeq2SeqLM.from_pretrained("nsi319/legal-pegasus")
    
  3. Prepare and Tokenize Text: Input your legal text and tokenize it:

    text = """Your legal document text here."""
    input_tokenized = tokenizer.encode(text, return_tensors='pt', max_length=1024, truncation=True)
    
  4. Generate Summary: Generate a summary using the model:

    summary_ids = model.generate(input_tokenized, num_beams=9, no_repeat_ngram_size=3, length_penalty=2.0, min_length=150, max_length=250, early_stopping=True)
    summary = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids][0]
    
  5. Cloud GPUs: For more efficient processing, especially with large texts, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

The legal-pegasus model is licensed under the MIT License, allowing for broad use and modification with minimal restrictions.

More Related APIs in Summarization