tts tacotron2 kss ko
tensorspeechIntroduction
This repository provides a pretrained Tacotron2 model with Guided Attention, specifically trained on the KSS (Korean Single Speaker) dataset. It forms part of the TensorFlowTTS library, designed for text-to-speech applications.
Architecture
The Tacotron2 model is a neural network architecture that synthesizes speech from text by predicting mel spectrograms. The model utilizes Guided Attention mechanisms to improve the alignment between text and audio features, facilitating more natural speech synthesis.
Training
The model was trained using the KSS dataset, which is a Korean speech dataset. Detailed training methodologies are documented in the Tacotron2 paper and the Guided Attention research.
Guide: Running Locally
-
Install TensorFlowTTS:
Execute the following command to install the required library:pip install TensorFlowTTS
-
Converting Text to Mel Spectrogram:
- Import necessary libraries and modules:
import numpy as np import soundfile as sf import yaml import tensorflow as tf from tensorflow_tts.inference import AutoProcessor, TFAutoModel
- Load the pretrained models:
processor = AutoProcessor.from_pretrained("tensorspeech/tts-tacotron2-kss-ko") tacotron2 = TFAutoModel.from_pretrained("tensorspeech/tts-tacotron2-kss-ko")
- Convert text to mel spectrogram:
text = "신은 우리의 수학 문제에는 관심이 없다. 신은 다만 경험적으로 통합할 뿐이다." input_ids = processor.text_to_sequence(text) decoder_output, mel_outputs, stop_token_prediction, alignment_history = tacotron2.inference( input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0), input_lengths=tf.convert_to_tensor([len(input_ids)], tf.int32), speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32), )
- Import necessary libraries and modules:
-
Cloud GPUs:
For efficient processing, consider using cloud-based GPUs such as those offered by Google Cloud, AWS, or Azure.
License
This project is licensed under the Apache-2.0 License, which allows for both personal and commercial use.