indic parler tts
ai4bharatIndic Parler-TTS Model Documentation
Introduction
Indic Parler-TTS is a multilingual text-to-speech model extension of Parler-TTS Mini. It supports 21 languages, including 20 Indic languages and English, and is designed for regional language technologies. The model utilizes a robust prompt tokenizer, enabling easy language extension and multilingual training.
Architecture
The model architecture supports two primary inputs: a transcript for text conversion and a caption for detailing speech characteristics. It features 69 unique voices and includes capabilities like emotion rendering, accent flexibility, and customizable outputs for background noise, expressivity, pitch, and more.
Training
Indic Parler-TTS was fine-tuned on the Indic Parler Dataset, consisting of 1,806 hours of multilingual Indic and English speech. The dataset covers 16 official Indian languages along with English and Chhattisgarhi. The model was evaluated using a MOS-like framework, achieving high scores for naturalness and intelligibility.
Guide: Running Locally
-
Installation:
Install the Parler-TTS library:pip install git+https://github.com/huggingface/parler-tts.git
-
Running the Model:
- Import necessary libraries.
- Load the model and tokenizers.
- Prepare the text prompt and descriptions.
- Generate and save the audio output.
-
Hardware Recommendations:
- For optimal performance, use a cloud GPU service like AWS, Google Cloud, or Azure.
License
This model is released under the Apache 2.0 license, allowing permissive use and distribution.