Re Punctuate

SJ-Ray

Introduction

Re-Punctuate is a T5 model developed to correct capitalization and punctuation in sentences. It uses the DialogSum dataset for fine-tuning, enabling it to accurately address punctuation and capitalization errors.

Architecture

Re-Punctuate utilizes the T5 (Text-To-Text Transfer Transformer) architecture, a transformer-based model optimized for text-to-text tasks. It is implemented using TensorFlow and integrates with Hugging Face's Transformers library.

Training

The model is fine-tuned using the DialogSum dataset, which contains 115,056 records. This dataset helps the model learn accurate punctuation and capitalization corrections in various textual contexts.

Guide: Running Locally

  1. Install Dependencies:

    • Ensure Python is installed.
    • Install the Transformers library via pip:
      pip install transformers tensorflow
      
  2. Run the Model:

    • Use the following Python script to load and utilize the Re-Punctuate model:
      from transformers import T5Tokenizer, TFT5ForConditionalGeneration
      
      tokenizer = T5Tokenizer.from_pretrained('SJ-Ray/Re-Punctuate')
      model = TFT5ForConditionalGeneration.from_pretrained('SJ-Ray/Re-Punctuate')
      
      input_text = 'your input text here'
      inputs = tokenizer.encode("punctuate: " + input_text, return_tensors="tf")
      result = model.generate(inputs)
      decoded_output = tokenizer.decode(result[0], skip_special_tokens=True)
      print(decoded_output)
      
  3. GPU Recommendations:

    • For efficient processing, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure. These platforms offer GPU instances that can significantly speed up model inference.

License

The Re-Punctuate model is licensed under the Apache-2.0 License, allowing for broad use and modification with proper attribution.

More Related APIs in Text2text Generation