bigbird pegasus large arxiv
googleIntroduction
BigBird-Pegasus is a transformer model developed by Google, designed to handle long sequences using sparse attention. It is particularly effective for tasks such as document summarization and question-answering with extended contexts. This model was introduced in the paper "Big Bird: Transformers for Longer Sequences" and is available on Hugging Face under the model identifier google/bigbird-pegasus-large-arxiv
.
Architecture
BigBird extends traditional transformer models by using block sparse attention, enabling it to manage sequences up to 4096 tokens with reduced computational cost compared to models like BERT. The architecture is optimized for tasks involving long documents and supports various attention configurations, enhancing its versatility.
Training
The BigBird-Pegasus model was fine-tuned for summarization on the arXiv dataset from the scientific_papers
collection. The training process involved optimizing the model to achieve state-of-the-art results on tasks requiring long-sequence processing.
Guide: Running Locally
To run BigBird-Pegasus locally, follow these steps:
-
Install Transformers Library:
Ensure you have the Hugging Face Transformers library installed:pip install transformers
-
Load the Model and Tokenizer:
Use the following code to load the model and tokenizer:from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv") model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv")
-
Generate Predictions:
Prepare your input text and generate predictions:text = "Replace me by any text you'd like." inputs = tokenizer(text, return_tensors='pt') prediction = model.generate(**inputs) prediction = tokenizer.batch_decode(prediction)
-
Optional Configurations:
Adjustattention_type
,block_size
, andnum_random_blocks
as needed:model = BigBirdPegasusForConditionalGeneration.from_pretrained( "google/bigbird-pegasus-large-arxiv", attention_type="original_full", block_size=16, num_random_blocks=2 )
For more demanding tasks, consider using cloud GPUs for better performance, such as those offered by AWS, Google Cloud, or Azure.
License
The BigBird-Pegasus model is licensed under the Apache 2.0 License, allowing for both commercial and non-commercial use with appropriate attribution.