bert2bert turkish paraphrase generation

ahmetbagci

Introduction

The BERT2BERT-TURKISH-PARAPHRASE-GENERATION model is designed for generating paraphrases in Turkish. It leverages the encoder-decoder architecture with BERT as its underlying framework, specifically utilizing the PyTorch library.

Architecture

This model employs an encoder-decoder architecture, which is a common structure for text-to-text generation tasks. It uses BERT for both encoding and decoding processes, facilitating effective paraphrasing through a seq2seq approach.

Training

The model was trained using a dataset that combines the translated QQP dataset with a manually generated dataset. This combination aims to enhance the quality and accuracy of paraphrase generation in Turkish. The dataset for training can be accessed here.

Guide: Running Locally

To use the BERT2BERT-TURKISH-PARAPHRASE-GENERATION model locally, follow these steps:

  1. Install Transformers Library:
    Ensure that the transformers library is installed using pip:

    pip install transformers
    
  2. Load Tokenizer and Model:
    Use the following Python script to load the tokenizer and model:

    from transformers import BertTokenizerFast, EncoderDecoderModel
    
    tokenizer = BertTokenizerFast.from_pretrained("dbmdz/bert-base-turkish-cased")
    model = EncoderDecoderModel.from_pretrained("ahmetbagci/bert2bert-turkish-paraphrase-generation")
    
  3. Generate Paraphrases:
    Prepare your input text and generate paraphrases as shown:

    text = "son model arabalar çevreye daha mı az zarar veriyor?"
    input_ids = tokenizer(text, return_tensors="pt").input_ids
    output_ids = model.generate(input_ids)
    print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
    
  4. Cloud GPUs:
    For efficient processing, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

The model and associated data are likely subject to licensing terms, but specific license details are not provided in the documentation. It is advisable to check the repository or contact the author for accurate licensing information.

More Related APIs in Text2text Generation