tts fastspeech2 kss ko
tensorspeechTTS-FastSpeech2-KSS-KO
Introduction
TTS-FastSpeech2-KSS-KO is a pretrained FastSpeech2 model designed for text-to-speech tasks, specifically tailored to the Korean language using the KSS dataset. It leverages the TensorFlowTTS library to facilitate efficient and high-quality text-to-speech conversion.
Architecture
The model is based on FastSpeech2, a text-to-mel spectrogram conversion architecture known for its speed and quality. It uses a sequence-to-sequence approach without relying on attention mechanisms, which enhances its efficiency.
Training
The model is trained on the KSS dataset, a Korean speech dataset. FastSpeech2 uses a robust training pipeline to convert text inputs into mel spectrograms, which can subsequently be transformed into audio waveforms.
Guide: Running Locally
-
Install TensorFlowTTS: First, ensure you have TensorFlowTTS installed using:
pip install TensorFlowTTS
-
Convert Text to Mel Spectrogram:
import numpy as np import tensorflow as tf from tensorflow_tts.inference import AutoProcessor, TFAutoModel processor = AutoProcessor.from_pretrained("tensorspeech/tts-fastspeech2-kss-ko") fastspeech2 = TFAutoModel.from_pretrained("tensorspeech/tts-fastspeech2-kss-ko") text = "신은 우리의 수학 문제에는 관심이 없다. 신은 다만 경험적으로 통합할 뿐이다." input_ids = processor.text_to_sequence(text) mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference( input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0), speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32), speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), f0_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), energy_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), )
-
Hardware Recommendations: For optimal performance, especially when handling large datasets or deploying in production, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
The project is licensed under the Apache-2.0 License, allowing for both personal and commercial use, modification, and distribution of the model and its codebase.