Re Punctuate
SJ-RayIntroduction
Re-Punctuate is a T5 model developed to correct capitalization and punctuation in sentences. It uses the DialogSum dataset for fine-tuning, enabling it to accurately address punctuation and capitalization errors.
Architecture
Re-Punctuate utilizes the T5 (Text-To-Text Transfer Transformer) architecture, a transformer-based model optimized for text-to-text tasks. It is implemented using TensorFlow and integrates with Hugging Face's Transformers library.
Training
The model is fine-tuned using the DialogSum dataset, which contains 115,056 records. This dataset helps the model learn accurate punctuation and capitalization corrections in various textual contexts.
Guide: Running Locally
-
Install Dependencies:
- Ensure Python is installed.
- Install the Transformers library via pip:
pip install transformers tensorflow
-
Run the Model:
- Use the following Python script to load and utilize the Re-Punctuate model:
from transformers import T5Tokenizer, TFT5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained('SJ-Ray/Re-Punctuate') model = TFT5ForConditionalGeneration.from_pretrained('SJ-Ray/Re-Punctuate') input_text = 'your input text here' inputs = tokenizer.encode("punctuate: " + input_text, return_tensors="tf") result = model.generate(inputs) decoded_output = tokenizer.decode(result[0], skip_special_tokens=True) print(decoded_output)
- Use the following Python script to load and utilize the Re-Punctuate model:
-
GPU Recommendations:
- For efficient processing, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure. These platforms offer GPU instances that can significantly speed up model inference.
License
The Re-Punctuate model is licensed under the Apache-2.0 License, allowing for broad use and modification with proper attribution.