bigbird base trivia itc
googleIntroduction
The bigbird-base-trivia-itc
model is a fine-tuned version of the bigbird-roberta-base
, optimized for question answering tasks using the trivia_qa
dataset. It employs the BigBirdForQuestionAnsweringHead
architecture to improve performance on longer sequences.
Architecture
The model is based on the BigBird architecture, which is an enhanced version of Transformers designed to handle longer sequences efficiently. It features:
- 12 attention heads
- 12 hidden layers
- A hidden layer size of 768
- A maximum sequence length of 4096
Training
The model was trained using the following configuration:
- Number of global tokens: 128
- Window length: 192
- Number of random tokens: 192
- Batch size: 32
- Loss function: Cross-entropy with noisy spans
Guide: Running Locally
To use the model locally with PyTorch:
-
Install the Transformers library:
pip install transformers
-
Load the model:
from transformers import BigBirdForQuestionAnswering, BigBirdTokenizer model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc") tokenizer = BigBirdTokenizer.from_pretrained("google/bigbird-base-trivia-itc") question = "Replace me by any text you'd like." context = "Put some context for answering" encoded_input = tokenizer(question, context, return_tensors='pt') output = model(**encoded_input)
-
Optional configurations:
- Switch attention to full:
model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc", attention_type="original_full")
- Adjust block size and random blocks:
model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc", block_size=16, num_random_blocks=2)
- Switch attention to full:
For enhanced performance, consider using cloud GPU services such as AWS, GCP, or Azure.
License
This model is licensed under the Apache-2.0 license, allowing for both personal and commercial use with attribution.