triviaqa t5 base LLM Model

Introduction

The Closed Book Trivia-QA T5 base is a T5-base model fine-tuned on the No Context Trivia QA dataset. It is designed to answer trivia questions by retrieving information from its internal memory. The model was initially pre-trained on the Common Crawl (C4) dataset and has been further fine-tuned for trivia question answering.

Architecture

The model utilizes the T5 (Text-to-Text Transfer Transformer) architecture, operating in a text-to-text generation paradigm. It is implemented using PyTorch and is specifically configured for closed-book question answering, meaning it generates answers based on pre-learned information rather than external context.

Training

The model was trained over 135 epochs with a batch size of 32 and a learning rate of 1e-3. The input and output lengths were constrained to 25 and 10 tokens, respectively. It achieved an Exact Match (EM) score of 17 and a Subset Match score of 24.5. Detailed insights into the training process are available in a blog post.

Guide: Running Locally

To run the model locally, follow these steps:

Install the Transformers Library: Ensure you have the transformers and torch libraries installed.
```
pip install transformers torch
```

Load the Model:

from transformers import AutoTokenizer, AutoModelWithLMHead
import torch

tokenizer = AutoTokenizer.from_pretrained("deep-learning-analytics/triviaqa-t5-base")
model = AutoModelWithLMHead.from_pretrained("deep-learning-analytics/triviaqa-t5-base")
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)

Prepare and Run Inference:

text = "Who directed the movie Jaws?"
preprocess_text = text.strip().replace("\n","")
tokenized_text = tokenizer.encode(preprocess_text, return_tensors="pt").to(device)

outs = model.model.generate(
    tokenized_text,
    max_length=10,
    num_beams=2,
    early_stopping=True
)

dec = [tokenizer.decode(ids) for ids in outs]
print("Predicted Answer: ", dec)

Suggestions: For optimal performance, consider using cloud GPUs from providers such as AWS, GCP, or Azure.

License

The model and associated code are subject to the licensing terms provided by Hugging Face and the creators of the model. Ensure compliance with these terms when using the model for personal or commercial purposes.