tts tacotron2 ljspeech
speechbrainIntroduction
This repository provides tools for Text-to-Speech (TTS) using SpeechBrain's Tacotron2 model pre-trained on the LJSpeech dataset. The model converts text input to spectrograms, which can be transformed into waveforms using a vocoder like HiFIGAN.
Architecture
The model architecture employs Tacotron2, a neural network architecture designed for speech synthesis. It generates spectrograms from text, which can be converted to audio waveforms using a vocoder.
Training
The model was trained using the SpeechBrain framework. For custom training:
- Clone the SpeechBrain repository:
git clone https://github.com/speechbrain/speechbrain/
- Install dependencies:
cd speechbrain pip install -r requirements.txt pip install -e .
- Run the training script:
cd recipes/LJSpeech/TTS/tacotron2/ python train.py --device=cuda:0 --max_grad_norm=1.0 --data_folder=/your_folder/LJSpeech-1.1 hparams/train.yaml
Guide: Running Locally
-
Install SpeechBrain:
pip install speechbrain
-
Perform Text-to-Speech:
import torchaudio from speechbrain.inference.TTS import Tacotron2 from speechbrain.inference.vocoders import HIFIGAN tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir="tmpdir_tts") hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder") mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb") waveforms = hifi_gan.decode_batch(mel_output) torchaudio.save('example_TTS.wav', waveforms.squeeze(1), 22050)
-
Inference on GPU: Add
run_opts={"device":"cuda"}
to thefrom_hparams
calls for GPU acceleration. -
Suggested Cloud GPUs: Services like AWS, Google Cloud, or Azure offer GPU instances suitable for TTS tasks.
License
The project is licensed under the Apache 2.0 License. This allows for both personal and commercial use, modification, and distribution of the software, provided that proper attribution is given.