soundchoice g2p
speechbrainIntroduction
SoundChoice-G2P is a grapheme-to-phoneme (G2P) conversion model designed to convert English text into phonetic transcriptions with semantic disambiguation. Developed using SpeechBrain, it utilizes data from LibriSpeech Alignments and Google Wikipedia to enhance its accuracy and performance.
Architecture
The SoundChoice-G2P model leverages SpeechBrain's robust framework for speech processing. It includes a high-level wrapper for performing G2P conversions, providing both single text and batch processing capabilities. The model was trained on LibriG2P data, derived from recognized datasets like LibriSpeech.
Training
To train the model from scratch:
- Clone SpeechBrain:
git clone https://github.com/speechbrain/speechbrain/
- Install SpeechBrain:
cd speechbrain pip install -r requirements.txt pip install -e .
- Run Training:
Adjust hyperparameters as needed by passing additional arguments.cd recipes/LibriSpeech/G2P python train.py hparams/hparams_g2p_rnn.yaml --data_folder=your_data_folder
Guide: Running Locally
- Install SpeechBrain:
pip install speechbrain pip install transformers
- Perform G2P Conversion:
from speechbrain.inference.text import GraphemeToPhoneme g2p = GraphemeToPhoneme.from_hparams("speechbrain/soundchoice-g2p", savedir="pretrained_models/soundchoice-g2p") text = "To be or not to be, that is the question" phonemes = g2p(text)
- Batch Processing:
items = [ "All's Well That Ends Well", "The Merchant of Venice", "The Two Gentlemen of Verona", "The Comedy of Errors" ] transcriptions = g2p(items)
- Inference on GPU: Add
run_opts={"device":"cuda"}
when calling thefrom_hparams
method for GPU support.
Cloud GPUs: Consider using cloud-based GPU services like AWS, Google Cloud, or Azure for enhanced performance during training and inference.
License
The SoundChoice-G2P model is licensed under the Apache-2.0 License. This permissive license allows for wide usage and modification, provided that appropriate credits are given to the original authors.