mt5 small turkish squad LLM Model

Introduction

The MT5-Small Turkish Question Answering System is a fine-tuned version of Google's Multilingual T5-small model, specifically adapted for Turkish question-answering tasks. It uses the TQUAD dataset and is implemented with PyTorch Lightning.

Architecture

The model is based on Google's mT5-small architecture, which includes 300 million parameters and has a size of approximately 1.2 GB. The mT5-small model was pre-trained on the mC4 dataset, requiring additional fine-tuning for specific tasks, such as question answering in Turkish.

Training

The model was fine-tuned on the Turkish Question Answering dataset (TQUAD) using PyTorch Lightning. Fine-tuning allows the model to perform well on the question-answering downstream task by adapting the pre-trained mT5-small model to the specifics of the Turkish language and the nature of the dataset.

Guide: Running Locally

To run the model locally, follow these steps:

Install the transformers library from Hugging Face and torch.

Load the tokenizer and model:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("ozcangundes/mt5-small-turkish-squad")
model = AutoModelForSeq2SeqLM.from_pretrained("ozcangundes/mt5-small-turkish-squad")

Implement the function to get answers from the model:

def get_answer(question, context):
    source_encoding = tokenizer(
        question,
        context,
        max_length=512,
        padding="max_length",
        truncation="only_second",
        return_attention_mask=True,
        add_special_tokens=True,
        return_tensors="pt")
    
    generated_ids = model.generate(
        input_ids=source_encoding["input_ids"],
        attention_mask=source_encoding["attention_mask"],
        max_length=120)

    preds = [tokenizer.decode(gen_id, skip_special_tokens=True, clean_up_tokenization_spaces=True) for gen_id in generated_ids]

    return "".join(preds)

Use the function with your questions and context to get answers.

For optimal performance, consider using cloud GPUs from providers like AWS, GCP, or Azure to handle the computational demands of running such a model.

License

This project is licensed under the MIT License.