hu Bert fine tuned hungarian squadv2
mcsabaiIntroduction
The huBERT-fine-tuned-hungarian-squadv2
model is a fine-tuned version of the huBERT base model, which is designed for question answering tasks in Hungarian. It was fine-tuned on the Hungarian SQuADv2 dataset, which includes both answerable and unanswerable questions.
Architecture
The model builds upon the huBERT base model (cased) and incorporates a tokenizer from the same framework. It utilizes a machine-translated version of the SQuAD dataset, leveraging the Google Translate API to adapt the dataset for Hungarian.
Training
This model was trained on the SQuAD2.0 dataset, which combines answerable and unanswerable questions. The dataset consists of 100,000 questions from SQuAD1.1 and over 50,000 additional unanswerable questions. The training objective requires the model to answer questions when possible and recognize when a question cannot be answered based on the given context.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install the Hugging Face Transformers library:
pip install transformers
-
Import the pipeline and configure it for question answering:
from transformers import pipeline qa_pipeline = pipeline( "question-answering", model="mcsabai/huBert-fine-tuned-hungarian-squadv2", tokenizer="mcsabai/huBert-fine-tuned-hungarian-squadv2", topk=1, handle_impossible_answer=True )
-
Use the pipeline to make predictions:
predictions = qa_pipeline({ 'context': "Máté vagyok és Budapesten élek már több mint 4 éve.", 'question': "Hol lakik Máté?" }) print(predictions)
Example output:
{'score': 0.9892364144325256, 'start': 16, 'end': 26, 'answer': 'Budapesten'}
For best performance, consider using cloud-based GPUs, such as those provided by AWS, GCP, or Azure, to handle the computational requirements.
License
The model and its components are hosted on Hugging Face's Model Hub, and you should review the specific license agreements associated with the huBERT
model and the SQuAD dataset to ensure compliance with usage terms.