parler tts large v1
parler-ttsIntroduction
Parler-TTS Large v1 is a text-to-speech (TTS) model with 2.2 billion parameters. It produces high-quality, natural-sounding speech using a simple text prompt to control features such as gender, background noise, speaking rate, pitch, and reverberation. Developed as part of the Parler-TTS project, this model is designed to provide the community with TTS training resources and dataset pre-processing code.
Architecture
Parler-TTS Large v1 can generate speech with various controllable features through text prompts. The model was trained on 45,000 hours of audio data and supports multiple speaker voices. It offers flexibility in defining speaker characteristics for consistent voice reproduction.
Training
The model is trained with a focus on generating speech from natural language descriptions. It utilizes datasets like parler-tts/mls_eng
and parler-tts/libritts_r_filtered
, among others, to ensure a wide range of speaker and environmental characteristics.
Guide: Running Locally
Installation
To use Parler-TTS, install the library with the following command:
pip install git+https://github.com/huggingface/parler-tts.git
Running a Model
-
Random Voice Generation:
- Import necessary libraries and load the model and tokenizer.
- Define a text prompt and a description of the desired speech characteristics.
- Generate audio and save it to a file.
-
Specific Speaker Generation:
- Specify the speaker's characteristics in the description to ensure consistency.
- Follow similar steps as for random voice generation.
Hardware Suggestions
For optimal performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure to handle the computational demands of generating speech.
License
This model is released under the Apache 2.0 license, allowing for extensive use and modification within the terms of the license.