mbart large turkish summarization LLM Model

Introduction

The MBART-LARGE-TURKISH-SUMMARIZATION model is a fine-tuned version of the facebook/mbart-large-50 model, specifically trained on the mlsum Turkish dataset. It is designed for text summarization tasks in the Turkish language, achieving notable metrics including Rouge1: 46.7011, Rouge2: 34.0087, Rougel: 41.5475, and Rougelsum: 43.2108.

Architecture

This model is built on the MBART architecture, which is a multilingual variant of the BART transformer model. It utilizes attention mechanisms and is robust for tasks involving text generation and summarization across multiple languages. The base model used for fine-tuning is facebook/mbart-large-50.

Training

The model was trained using the following hyperparameters:

Learning Rate: 5e-05
Batch Sizes: Train - 2, Eval - 4
Seed: 42
Distributed Type: Multi-GPU with 8 devices
Gradient Accumulation Steps: 4
Total Train Batch Size: 64
Total Eval Batch Size: 32
Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
Learning Rate Scheduler Type: Linear
Number of Epochs: 10.0
Mixed Precision Training: Native AMP
Label Smoothing Factor: 0.1

Framework versions used include Transformers 4.11.3, PyTorch 1.8.2+cu111, Datasets 1.14.0, and Tokenizers 0.10.3.

Guide: Running Locally

Install Dependencies: Ensure you have Python and pip installed. Use pip to install the required libraries:
```
pip install transformers datasets torch
```

Download the Model: Use the Hugging Face Transformers library to load the model:

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

model = MBartForConditionalGeneration.from_pretrained('mukayese/mbart-large-turkish-summarization')
tokenizer = MBart50TokenizerFast.from_pretrained('mukayese/mbart-large-turkish-summarization')

Run Inference: Prepare your input text and generate the summary:

input_text = "Your Turkish text here."
inputs = tokenizer(input_text, return_tensors="pt")
summary_ids = model.generate(inputs.input_ids)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary)

Cloud GPUs: For efficient training and inference, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Microsoft Azure.

License

The licensing information for this model is not explicitly mentioned in the provided data. Users should refer to the Hugging Face platform or the primary model's page for detailed licensing terms.

More Related APIs in Text2text Generation