roberta base squad2 LLM Model

Introduction

The roberta-base-squad2 model by deepset is a fine-tuned version of the roberta-base model, optimized for the task of Extractive Question Answering using the SQuAD 2.0 dataset. It is specifically trained to handle both answerable and unanswerable questions.

Architecture

Base Model: FacebookAI/roberta-base
Language: English
Task: Extractive Question Answering
Training Data: SQuAD 2.0
Evaluation Data: SQuAD 2.0

The model architecture is based on the RoBERTa architecture, which is a robustly optimized BERT pretraining approach.

Training

Training was performed on 4x Tesla V100 GPUs with the following hyperparameters:

Batch Size: 96
Epochs: 2
Max Sequence Length: 386
Learning Rate: 3e-5
Learning Rate Schedule: LinearWarmup
Warmup Proportion: 0.2
Document Stride: 128
Max Query Length: 64

Guide: Running Locally

Using Haystack

Install Haystack and Transformers:

pip install haystack-ai "transformers[torch,sentencepiece]"

Load the model in Haystack:

from haystack import Document
from haystack.components.readers import ExtractiveReader

docs = [Document(content="Python is a popular programming language")]
reader = ExtractiveReader(model="deepset/roberta-base-squad2")
reader.warm_up()
question = "What is a popular programming language?"
result = reader.run(query=question, documents=docs)

Using Transformers

Load the model and tokenizer via Transformers:

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "deepset/roberta-base-squad2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
  'question': 'Why is model conversion important?',
  'context': 'The option to convert models between FARM and transformers gives freedom to the user and lets people easily switch between frameworks.'
}
res = nlp(QA_input)

Suggested Cloud GPUs

Tesla V100
NVIDIA A100

License

The roberta-base-squad2 model is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).