xlm roberta large squad2
deepsetIntroduction
The deepset/xlm-roberta-large-squad2
model is a multilingual, large-scale language model designed for extractive question answering tasks. It is based on XLM-RoBERTa and trained on the SQuAD 2.0 dataset, enabling it to handle question answering across various languages.
Architecture
- Model: XLM-RoBERTa Large
- Task: Extractive Question Answering
- Language Support: Multilingual
- Training Data: SQuAD 2.0
- Evaluation Data: SQuAD dev set, German MLQA, German XQuAD
Training
Training involves the following key hyperparameters:
- Batch Size: 32
- Epochs: 3
- Learning Rate: 1e-5
- Max Sequence Length: 256
- Learning Rate Schedule: Linear Warmup
- Warmup Proportion: 0.2
- Doc Stride: 128
- Max Query Length: 64
The training utilized four Tesla V100 GPUs.
Guide: Running Locally
To run the model locally, you can use frameworks like Haystack or Transformers.
Using Transformers
-
Install Required Libraries:
pip install transformers torch
-
Run the Model:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline model_name = "deepset/xlm-roberta-large-squad2" nlp = pipeline('question-answering', model=model_name, tokenizer=model_name) QA_input = { 'question': 'Why is model conversion important?', 'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.' } res = nlp(QA_input)
Using Haystack
-
Install Haystack:
pip install haystack-ai "transformers[torch,sentencepiece]"
-
Load the Model:
from haystack import Document from haystack.components.readers import ExtractiveReader docs = [Document(content="Python is a popular programming language")] reader = ExtractiveReader(model="deepset/xlm-roberta-large-squad2") reader.warm_up() question = "What is a popular programming language?" result = reader.run(query=question, documents=docs)
Cloud GPUs
For optimal performance, consider using cloud-based GPUs such as those offered by AWS, GCP, or Azure.
License
The model is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).