transformer turkish summarization LLM Model

Introduction

The Mukayese Transformer Turkish Summarization model is designed for text summarization tasks in the Turkish language. It is uncased and was trained solely on the mlsum/tu dataset without any pre-training. The model achieves notable performance with evaluation metrics such as Rouge1, Rouge2, Rougel, and Rougelsum.

Architecture

The model architecture is based on the BART transformer framework, tailored for text-to-text generation tasks. It operates in an uncased format and leverages the multi-GPU distributed training approach for enhanced performance.

Training

The model was trained using specific hyperparameters:

Learning rate: 0.0001
Train batch size: 4
Eval batch size: 8
Seed: 42
Distributed type: multi-GPU with 8 devices
Gradient accumulation steps: 2
Total train and eval batch size: 64
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
LR scheduler type: linear
Number of epochs: 15
Mixed precision training: Native AMP
Label smoothing factor: 0.1

The training employed versions of various frameworks:

Transformers 4.11.3
PyTorch 1.8.2+cu111
Datasets 1.14.0
Tokenizers 0.10.3

Guide: Running Locally

To run the Mukayese Transformer Turkish Summarization model locally:

Ensure you have Python and the necessary libraries installed.
Install Hugging Face Transformers, PyTorch, Datasets, and Tokenizers.
Download the model from Hugging Face's model hub.
Load the model in a Python script and prepare the input data.
Run the model to perform text summarization.

For optimal performance, consider using cloud services offering GPU support, such as AWS, Google Cloud, or Azure.

License

The model is released under the MIT License, allowing for wide usage and modification while maintaining attribution to the original authors.

More Related APIs in Summarization