t5 base uk to us english
EnglishVoiceIntroduction
The T5-BASE-UK-TO-US-ENGLISH model by English Voice AI Labs is designed to convert UK English sentences into US English. It modifies both spelling and vocabulary to align with American English conventions.
Architecture
This model employs the T5 architecture, a popular choice for text-to-text transformations. It can handle tasks such as paraphrase-generation and text-generation-inference. It is compatible with frameworks like PyTorch, TensorFlow, and JAX.
Training
The model was trained using a dataset of 264,519 sentences with UK English spelling and their corresponding US English translations. This dataset was created by English Voice AI Labs and is available for download from their website.
Guide: Running Locally
-
Environment Setup:
- Ensure Python and PyTorch are installed.
- Install the
transformers
library:pip install transformers
-
Device Configuration:
- Check for GPU availability and set the device:
import torch device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
- Check for GPU availability and set the device:
-
Model Loading:
- Load the model and tokenizer:
from transformers import T5ForConditionalGeneration, T5Tokenizer model = T5ForConditionalGeneration.from_pretrained("EnglishVoice/t5-base-uk-to-us-english").to(device) tokenizer = T5Tokenizer.from_pretrained("EnglishVoice/t5-base-uk-to-us-english")
- Load the model and tokenizer:
-
Inference Example:
- Encode input and generate output:
input_text = "My favourite colour is yellow." text = "UK to US: " + input_text encoding = tokenizer.encode_plus(text, return_tensors = "pt") input_ids = encoding["input_ids"].to(device) attention_masks = encoding["attention_mask"].to(device) beam_outputs = model.generate(input_ids=input_ids, attention_mask=attention_masks, early_stopping=True) result = tokenizer.decode(beam_outputs[0], skip_special_tokens=True) print(result)
- Encode input and generate output:
-
Cloud GPUs:
- For optimal performance, consider using cloud GPU services like AWS, GCP, or Azure.
License
The T5-BASE-UK-TO-US-ENGLISH model is licensed under the Apache 2.0 License, allowing for both personal and commercial use.