bigbird base trivia itc LLM Model

Introduction

The bigbird-base-trivia-itc model is a fine-tuned version of the bigbird-roberta-base, optimized for question answering tasks using the trivia_qa dataset. It employs the BigBirdForQuestionAnsweringHead architecture to improve performance on longer sequences.

Architecture

The model is based on the BigBird architecture, which is an enhanced version of Transformers designed to handle longer sequences efficiently. It features:

12 attention heads
12 hidden layers
A hidden layer size of 768
A maximum sequence length of 4096

Training

The model was trained using the following configuration:

Number of global tokens: 128
Window length: 192
Number of random tokens: 192
Batch size: 32
Loss function: Cross-entropy with noisy spans

Guide: Running Locally

To use the model locally with PyTorch:

Install the Transformers library:
```
pip install transformers
```

Load the model:

from transformers import BigBirdForQuestionAnswering, BigBirdTokenizer

model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc")
tokenizer = BigBirdTokenizer.from_pretrained("google/bigbird-base-trivia-itc")

question = "Replace me by any text you'd like."
context = "Put some context for answering"
encoded_input = tokenizer(question, context, return_tensors='pt')
output = model(**encoded_input)

Optional configurations:

Switch attention to full:

model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc", attention_type="original_full")

Adjust block size and random blocks:

model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc", block_size=16, num_random_blocks=2)

For enhanced performance, consider using cloud GPU services such as AWS, GCP, or Azure.

License

This model is licensed under the Apache-2.0 license, allowing for both personal and commercial use with attribution.