hu Bert fine tuned hungarian squadv2 LLM Model

Introduction

The huBERT-fine-tuned-hungarian-squadv2 model is a fine-tuned version of the huBERT base model, which is designed for question answering tasks in Hungarian. It was fine-tuned on the Hungarian SQuADv2 dataset, which includes both answerable and unanswerable questions.

Architecture

The model builds upon the huBERT base model (cased) and incorporates a tokenizer from the same framework. It utilizes a machine-translated version of the SQuAD dataset, leveraging the Google Translate API to adapt the dataset for Hungarian.

Training

This model was trained on the SQuAD2.0 dataset, which combines answerable and unanswerable questions. The dataset consists of 100,000 questions from SQuAD1.1 and over 50,000 additional unanswerable questions. The training objective requires the model to answer questions when possible and recognize when a question cannot be answered based on the given context.

Guide: Running Locally

To run the model locally, follow these steps:

Install the Hugging Face Transformers library:
```
pip install transformers
```

Import the pipeline and configure it for question answering:

from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="mcsabai/huBert-fine-tuned-hungarian-squadv2",
    tokenizer="mcsabai/huBert-fine-tuned-hungarian-squadv2",
    topk=1,
    handle_impossible_answer=True
)

Use the pipeline to make predictions:

predictions = qa_pipeline({
    'context': "Máté vagyok és Budapesten élek már több mint 4 éve.",
    'question': "Hol lakik Máté?"
})
print(predictions)

Example output:

{'score': 0.9892364144325256, 'start': 16, 'end': 26, 'answer': 'Budapesten'}

For best performance, consider using cloud-based GPUs, such as those provided by AWS, GCP, or Azure, to handle the computational requirements.

License

The model and its components are hosted on Hugging Face's Model Hub, and you should review the specific license agreements associated with the huBERT model and the SQuAD dataset to ensure compliance with usage terms.