tts tacotron2 ljspeech

speechbrain

Introduction

This repository provides tools for Text-to-Speech (TTS) using SpeechBrain's Tacotron2 model pre-trained on the LJSpeech dataset. The model converts text input to spectrograms, which can be transformed into waveforms using a vocoder like HiFIGAN.

Architecture

The model architecture employs Tacotron2, a neural network architecture designed for speech synthesis. It generates spectrograms from text, which can be converted to audio waveforms using a vocoder.

Training

The model was trained using the SpeechBrain framework. For custom training:

  1. Clone the SpeechBrain repository:
    git clone https://github.com/speechbrain/speechbrain/
    
  2. Install dependencies:
    cd speechbrain
    pip install -r requirements.txt
    pip install -e .
    
  3. Run the training script:
    cd recipes/LJSpeech/TTS/tacotron2/
    python train.py --device=cuda:0 --max_grad_norm=1.0 --data_folder=/your_folder/LJSpeech-1.1 hparams/train.yaml
    

Guide: Running Locally

  1. Install SpeechBrain:

    pip install speechbrain
    
  2. Perform Text-to-Speech:

    import torchaudio
    from speechbrain.inference.TTS import Tacotron2
    from speechbrain.inference.vocoders import HIFIGAN
    
    tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir="tmpdir_tts")
    hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
    
    mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb")
    waveforms = hifi_gan.decode_batch(mel_output)
    torchaudio.save('example_TTS.wav', waveforms.squeeze(1), 22050)
    
  3. Inference on GPU: Add run_opts={"device":"cuda"} to the from_hparams calls for GPU acceleration.

  4. Suggested Cloud GPUs: Services like AWS, Google Cloud, or Azure offer GPU instances suitable for TTS tasks.

License

The project is licensed under the Apache 2.0 License. This allows for both personal and commercial use, modification, and distribution of the software, provided that proper attribution is given.

More Related APIs in Text To Speech