bigbird base trivia itc

google

Introduction

The bigbird-base-trivia-itc model is a fine-tuned version of the bigbird-roberta-base, optimized for question answering tasks using the trivia_qa dataset. It employs the BigBirdForQuestionAnsweringHead architecture to improve performance on longer sequences.

Architecture

The model is based on the BigBird architecture, which is an enhanced version of Transformers designed to handle longer sequences efficiently. It features:

  • 12 attention heads
  • 12 hidden layers
  • A hidden layer size of 768
  • A maximum sequence length of 4096

Training

The model was trained using the following configuration:

  • Number of global tokens: 128
  • Window length: 192
  • Number of random tokens: 192
  • Batch size: 32
  • Loss function: Cross-entropy with noisy spans

Guide: Running Locally

To use the model locally with PyTorch:

  1. Install the Transformers library:

    pip install transformers
    
  2. Load the model:

    from transformers import BigBirdForQuestionAnswering, BigBirdTokenizer
    
    model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc")
    tokenizer = BigBirdTokenizer.from_pretrained("google/bigbird-base-trivia-itc")
    
    question = "Replace me by any text you'd like."
    context = "Put some context for answering"
    encoded_input = tokenizer(question, context, return_tensors='pt')
    output = model(**encoded_input)
    
  3. Optional configurations:

    • Switch attention to full:
      model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc", attention_type="original_full")
      
    • Adjust block size and random blocks:
      model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc", block_size=16, num_random_blocks=2)
      

For enhanced performance, consider using cloud GPU services such as AWS, GCP, or Azure.

License

This model is licensed under the Apache-2.0 license, allowing for both personal and commercial use with attribution.

More Related APIs in Question Answering