tts hifigan ljspeech
speechbrainIntroduction
This repository provides tools for using a HiFIGAN vocoder trained on the LJSpeech dataset. The vocoder converts input spectrograms into waveforms, typically following a TTS model that transforms text into a spectrogram. The sampling frequency for the output is 22050 Hz.
Architecture
The HiFIGAN vocoder is designed for speech synthesis, specifically converting spectrograms to waveforms. It is trained on the LJSpeech dataset, which features a single speaker. While it can generalize to different speakers, for optimal results, multi-speaker vocoders trained on datasets like LibriTTS are recommended.
Training
The model was trained using the SpeechBrain framework. To train the model from scratch, follow these steps:
-
Clone SpeechBrain:
git clone https://github.com/speechbrain/speechbrain/
-
Install Dependencies:
cd speechbrain pip install -r requirements.txt pip install -e .
-
Run Training:
cd recipes/LJSpeech/TTS/vocoder/hifi_gan/ python train.py hparams/train.yaml --data_folder /path/to/LJspeech
Training results, including models and logs, can be accessed here.
Guide: Running Locally
Basic Steps
-
Install SpeechBrain:
pip install speechbrain
-
Basic Usage:
import torch from speechbrain.inference.vocoders import HIFIGAN hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="pretrained_models/tts-hifigan-ljspeech") mel_specs = torch.rand(2, 80, 298) waveforms = hifi_gan.decode_batch(mel_specs)
-
Using with TTS:
import torchaudio from speechbrain.inference.TTS import Tacotron2 tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir="pretrained_models/tts-tacotron2-ljspeech") mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb") waveforms = hifi_gan.decode_batch(mel_output) torchaudio.save('example_TTS.wav', waveforms.squeeze(1), 22050)
Cloud GPUs
For enhanced performance, especially during inference, using cloud GPUs is recommended. Popular platforms include AWS, Google Cloud, and Azure, which provide access to powerful GPUs on demand.
License
The project is licensed under the Apache-2.0 License, allowing for broad use and modification of the software.