xlm roberta base squad2 distilled
deepsetIntroduction
xlm-roberta-base-squad2-distilled
is a multilingual model designed for extractive question answering (QA). It is a distilled version trained using deepset's Haystack framework with deepset/xlm-roberta-large-squad2
as the teacher model. It is suitable for use cases involving multiple languages, based on SQuAD 2.0 datasets.
Architecture
The model is based on the XLM-RoBERTa architecture, which supports multiple languages and is optimized for extractive question answering tasks. It is used within the Haystack framework, allowing integration with larger LLM applications.
Training
Training involved using the SQuAD 2.0 dataset for both training and evaluation. The hyperparameters include:
- Batch Size: 56
- Number of Epochs: 4
- Maximum Sequence Length: 384
- Learning Rate: 3e-5
- Learning Rate Schedule: LinearWarmup
- Embedding Dropout Probability: 0.1
- Temperature: 3
- Distillation Loss Weight: 0.75
Performance was evaluated on the SQuAD 2.0 dev set, achieving an exact match score of 74.07% and an F1 score of 76.40%.
Guide: Running Locally
-
Install Dependencies
Use pip to install necessary packages:pip install haystack-ai "transformers[torch,sentencepiece]"
-
Using Haystack
Implement a simple QA system:from haystack import Document from haystack.components.readers import ExtractiveReader docs = [Document(content="Python is a popular programming language")] reader = ExtractiveReader(model="deepset/xlm-roberta-base-squad2-distilled") reader.warm_up() question = "What is a popular programming language?" result = reader.run(query=question, documents=docs)
-
Using Transformers
Load and use the model with Transformers:from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline model_name = "deepset/xlm-roberta-base-squad2-distilled" nlp = pipeline('question-answering', model=model_name, tokenizer=model_name) QA_input = {'question': 'Why is model conversion important?', 'context': '...'} res = nlp(QA_input)
-
Hardware Recommendations
For optimal performance, consider using cloud GPUs such as NVIDIA Tesla V100.
License
This model is licensed under the MIT License, allowing for wide usage and modification with attribution.