soundchoice g2p

speechbrain

Introduction

SoundChoice-G2P is a grapheme-to-phoneme (G2P) conversion model designed to convert English text into phonetic transcriptions with semantic disambiguation. Developed using SpeechBrain, it utilizes data from LibriSpeech Alignments and Google Wikipedia to enhance its accuracy and performance.

Architecture

The SoundChoice-G2P model leverages SpeechBrain's robust framework for speech processing. It includes a high-level wrapper for performing G2P conversions, providing both single text and batch processing capabilities. The model was trained on LibriG2P data, derived from recognized datasets like LibriSpeech.

Training

To train the model from scratch:

  1. Clone SpeechBrain:
    git clone https://github.com/speechbrain/speechbrain/
    
  2. Install SpeechBrain:
    cd speechbrain
    pip install -r requirements.txt
    pip install -e .
    
  3. Run Training:
    cd recipes/LibriSpeech/G2P
    python train.py hparams/hparams_g2p_rnn.yaml --data_folder=your_data_folder
    
    Adjust hyperparameters as needed by passing additional arguments.

Guide: Running Locally

  1. Install SpeechBrain:
    pip install speechbrain
    pip install transformers
    
  2. Perform G2P Conversion:
    from speechbrain.inference.text import GraphemeToPhoneme
    g2p = GraphemeToPhoneme.from_hparams("speechbrain/soundchoice-g2p", savedir="pretrained_models/soundchoice-g2p")
    text = "To be or not to be, that is the question"
    phonemes = g2p(text)
    
  3. Batch Processing:
    items = [
        "All's Well That Ends Well",
        "The Merchant of Venice",
        "The Two Gentlemen of Verona",
        "The Comedy of Errors"
    ]
    transcriptions = g2p(items)
    
  4. Inference on GPU: Add run_opts={"device":"cuda"} when calling the from_hparams method for GPU support.

Cloud GPUs: Consider using cloud-based GPU services like AWS, Google Cloud, or Azure for enhanced performance during training and inference.

License

The SoundChoice-G2P model is licensed under the Apache-2.0 License. This permissive license allows for wide usage and modification, provided that appropriate credits are given to the original authors.

More Related APIs in Text2text Generation