tts_transformer en 200_speaker cv4
facebookIntroduction
The TTS_TRANSFORMER-EN-200_SPEAKER-CV4 is a text-to-speech (TTS) model developed by Facebook, utilizing the fairseq library. It is designed to synthesize speech in English, supporting 200 distinct male and female voices by randomly selecting a speaker. The model is trained on the Common Voice v4 dataset.
Architecture
This model employs a Transformer architecture for text-to-speech tasks. It leverages fairseq's S^2 framework, as detailed in the associated research papers (arXiv:1809.08895 and arXiv:2109.06912). The model's architecture allows for multi-speaker functionality and is equipped with a vocoder option using Hifi-GAN.
Training
The TTS_TRANSFORMER-EN-200_SPEAKER-CV4 model was trained on the Mozilla Common Voice v4 dataset, which includes a diverse range of English speech samples. The training process utilizes fairseq's capabilities to enable scalable and integrable speech synthesis.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Dependencies: Ensure you have Python and the necessary libraries installed, including fairseq and IPython. You can install fairseq using pip:
pip install fairseq
-
Download the Model: Use the
load_model_ensemble_and_task_from_hf_hub
function from fairseq'scheckpoint_utils
to download and load the model:from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
-
Set Up the Model and Generator:
models, cfg, task = load_model_ensemble_and_task_from_hf_hub( "facebook/tts_transformer-en-200_speaker-cv4", arg_overrides={"vocoder": "hifigan", "fp16": False} ) model = models[0]
-
Generate Speech:
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface import IPython.display as ipd TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg) generator = task.build_generator(model, cfg) text = "Hello, this is a test run." sample = TTSHubInterface.get_model_input(task, text) wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample) ipd.Audio(wav, rate=rate)
For enhanced performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure.
License
The model and its associated code are subject to the terms outlined by Facebook and fairseq. For detailed information, refer to the respective repositories and documentation.