mbart large turkish summarization
mukayeseIntroduction
The MBART-LARGE-TURKISH-SUMMARIZATION model is a fine-tuned version of the facebook/mbart-large-50
model, specifically trained on the mlsum Turkish dataset. It is designed for text summarization tasks in the Turkish language, achieving notable metrics including Rouge1: 46.7011, Rouge2: 34.0087, Rougel: 41.5475, and Rougelsum: 43.2108.
Architecture
This model is built on the MBART architecture, which is a multilingual variant of the BART transformer model. It utilizes attention mechanisms and is robust for tasks involving text generation and summarization across multiple languages. The base model used for fine-tuning is facebook/mbart-large-50
.
Training
The model was trained using the following hyperparameters:
- Learning Rate: 5e-05
- Batch Sizes: Train - 2, Eval - 4
- Seed: 42
- Distributed Type: Multi-GPU with 8 devices
- Gradient Accumulation Steps: 4
- Total Train Batch Size: 64
- Total Eval Batch Size: 32
- Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
- Learning Rate Scheduler Type: Linear
- Number of Epochs: 10.0
- Mixed Precision Training: Native AMP
- Label Smoothing Factor: 0.1
Framework versions used include Transformers 4.11.3, PyTorch 1.8.2+cu111, Datasets 1.14.0, and Tokenizers 0.10.3.
Guide: Running Locally
-
Install Dependencies: Ensure you have Python and pip installed. Use pip to install the required libraries:
pip install transformers datasets torch
-
Download the Model: Use the Hugging Face Transformers library to load the model:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast model = MBartForConditionalGeneration.from_pretrained('mukayese/mbart-large-turkish-summarization') tokenizer = MBart50TokenizerFast.from_pretrained('mukayese/mbart-large-turkish-summarization')
-
Run Inference: Prepare your input text and generate the summary:
input_text = "Your Turkish text here." inputs = tokenizer(input_text, return_tensors="pt") summary_ids = model.generate(inputs.input_ids) summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) print(summary)
-
Cloud GPUs: For efficient training and inference, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Microsoft Azure.
License
The licensing information for this model is not explicitly mentioned in the provided data. Users should refer to the Hugging Face platform or the primary model's page for detailed licensing terms.