parler tts mini v1.1
parler-ttsIntroduction
Parler-TTS Mini v1.1 is a lightweight text-to-speech model capable of generating high-quality, natural-sounding speech. It allows control over features such as gender, background noise, speaking rate, pitch, and reverberation through simple text prompts. The model is based on a more advanced prompt tokenizer, allowing for multilingual training.
Architecture
Parler-TTS Mini v1.1 employs two tokenizers: one for prompts and another for descriptions. It supports 34 speakers with distinct characteristics, offering flexibility in voice generation. The model's tokenizer is derived from the unsloth/llama-2-7b tokenizer, enhancing its vocabulary and byte fallback capabilities.
Training
The model was trained on 45,000 hours of audio data from various datasets, including mls_eng and libritts, among others. It maintains the same training configuration as its predecessor, Parler-TTS Mini v1, with improvements made in tokenization for better multilingual handling.
Guide: Running Locally
To run Parler-TTS locally, follow these steps:
-
Install the Library:
pip install git+https://github.com/huggingface/parler-tts.git
-
Load the Model:
Use thetransformers
library to load both the model and tokenizers. Ensure that a compatible device like a GPU is available for faster processing. -
Generate Speech:
import torch from parler_tts import ParlerTTSForConditionalGeneration from transformers import AutoTokenizer import soundfile as sf device = "cuda:0" if torch.cuda.is_available() else "cpu" model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1.1").to(device) tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1.1") description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path) prompt = "Hey, how are you doing today?" description = "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch." input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device) prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device) generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids) audio_arr = generation.cpu().numpy().squeeze() sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
-
Cloud GPUs:
For optimal performance, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
Parler-TTS Mini v1.1 is released under the Apache 2.0 license, permitting free use, distribution, and modification with attribution.