mt5 small turkish squad
ozcangundesIntroduction
The MT5-Small Turkish Question Answering System is a fine-tuned version of Google's Multilingual T5-small model, specifically adapted for Turkish question-answering tasks. It uses the TQUAD dataset and is implemented with PyTorch Lightning.
Architecture
The model is based on Google's mT5-small architecture, which includes 300 million parameters and has a size of approximately 1.2 GB. The mT5-small model was pre-trained on the mC4 dataset, requiring additional fine-tuning for specific tasks, such as question answering in Turkish.
Training
The model was fine-tuned on the Turkish Question Answering dataset (TQUAD) using PyTorch Lightning. Fine-tuning allows the model to perform well on the question-answering downstream task by adapting the pre-trained mT5-small model to the specifics of the Turkish language and the nature of the dataset.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install the
transformers
library from Hugging Face andtorch
. -
Load the tokenizer and model:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("ozcangundes/mt5-small-turkish-squad") model = AutoModelForSeq2SeqLM.from_pretrained("ozcangundes/mt5-small-turkish-squad")
-
Implement the function to get answers from the model:
def get_answer(question, context): source_encoding = tokenizer( question, context, max_length=512, padding="max_length", truncation="only_second", return_attention_mask=True, add_special_tokens=True, return_tensors="pt") generated_ids = model.generate( input_ids=source_encoding["input_ids"], attention_mask=source_encoding["attention_mask"], max_length=120) preds = [tokenizer.decode(gen_id, skip_special_tokens=True, clean_up_tokenization_spaces=True) for gen_id in generated_ids] return "".join(preds)
-
Use the function with your questions and context to get answers.
For optimal performance, consider using cloud GPUs from providers like AWS, GCP, or Azure to handle the computational demands of running such a model.
License
This project is licensed under the MIT License.