bigbird pegasus large arxiv

google

Introduction

BigBird-Pegasus is a transformer model developed by Google, designed to handle long sequences using sparse attention. It is particularly effective for tasks such as document summarization and question-answering with extended contexts. This model was introduced in the paper "Big Bird: Transformers for Longer Sequences" and is available on Hugging Face under the model identifier google/bigbird-pegasus-large-arxiv.

Architecture

BigBird extends traditional transformer models by using block sparse attention, enabling it to manage sequences up to 4096 tokens with reduced computational cost compared to models like BERT. The architecture is optimized for tasks involving long documents and supports various attention configurations, enhancing its versatility.

Training

The BigBird-Pegasus model was fine-tuned for summarization on the arXiv dataset from the scientific_papers collection. The training process involved optimizing the model to achieve state-of-the-art results on tasks requiring long-sequence processing.

Guide: Running Locally

To run BigBird-Pegasus locally, follow these steps:

  1. Install Transformers Library:
    Ensure you have the Hugging Face Transformers library installed:

    pip install transformers
    
  2. Load the Model and Tokenizer:
    Use the following code to load the model and tokenizer:

    from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")
    model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv")
    
  3. Generate Predictions:
    Prepare your input text and generate predictions:

    text = "Replace me by any text you'd like."
    inputs = tokenizer(text, return_tensors='pt')
    prediction = model.generate(**inputs)
    prediction = tokenizer.batch_decode(prediction)
    
  4. Optional Configurations:
    Adjust attention_type, block_size, and num_random_blocks as needed:

    model = BigBirdPegasusForConditionalGeneration.from_pretrained(
        "google/bigbird-pegasus-large-arxiv", attention_type="original_full", block_size=16, num_random_blocks=2
    )
    

For more demanding tasks, consider using cloud GPUs for better performance, such as those offered by AWS, Google Cloud, or Azure.

License

The BigBird-Pegasus model is licensed under the Apache 2.0 License, allowing for both commercial and non-commercial use with appropriate attribution.

More Related APIs in Summarization