bert2bert_shared turkish summarization

mrm8488

Introduction

The BERT2BERT_SHARED-TURKISH-SUMMARIZATION model is designed for summarizing Turkish text, particularly focusing on news articles. It utilizes a fine-tuned version of BERT for the Turkish language, and is based on the MLSUM dataset.

Architecture

The model employs a BERT-based encoder-decoder architecture, initialized from the dbmdz/bert-base-turkish-cased checkpoint. This setup is specifically tailored for tasks involving text-to-text generation and summarization.

Training

The model is fine-tuned using the MLSUM dataset, which is a large-scale multilingual summarization dataset. MLSUM includes over 1.5 million article-summary pairs across multiple languages, including Turkish. The model's performance is evaluated using the ROUGE metric, achieving notable precision, recall, and F1 scores on the test set for ROUGE-2.

Guide: Running Locally

  1. Install Prerequisites: Ensure Python and PyTorch are installed. Install the transformers library from Hugging Face.
    pip install torch transformers
    
  2. Load Model and Tokenizer:
    import torch
    from transformers import BertTokenizerFast, EncoderDecoderModel
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    ckpt = 'mrm8488/bert2bert_shared-turkish-summarization'
    tokenizer = BertTokenizerFast.from_pretrained(ckpt)
    model = EncoderDecoderModel.from_pretrained(ckpt).to(device)
    
  3. Generate Summary:
    def generate_summary(text):
        inputs = tokenizer([text], padding="max_length", truncation=True, max_length=512, return_tensors="pt")
        input_ids = inputs.input_ids.to(device)
        attention_mask = inputs.attention_mask.to(device)
        output = model.generate(input_ids, attention_mask=attention_mask)
        return tokenizer.decode(output[0], skip_special_tokens=True)
    
    text = "Your text here..."
    print(generate_summary(text))
    

For enhanced performance, using cloud-based GPUs such as AWS EC2 instances or Google Cloud's AI Platform is recommended.

License

The BERT2BERT_SHARED-TURKISH-SUMMARIZATION model is made available under the terms specified by its creators. Users should refer to the Hugging Face model page for detailed license information.

More Related APIs in Summarization