mms tts
facebookIntroduction
The Massively Multilingual Speech (MMS) Text-to-Speech (TTS) models by Facebook support over 1000 languages, aiming to provide comprehensive speech technology. This repository is part of Facebook's MMS project, facilitating speech technology across diverse languages.
Architecture
The TTS models in this repository are part of the MMS project, designed to convert text into speech across 1107 supported languages. These models are built on the fairseq framework and are available through Hugging Face for ease of access and utilization.
Training
Information on training specifics is not provided in the documentation. However, the models have been developed to cover a wide range of languages using sophisticated text-to-speech techniques. Users can explore the model's architecture and training methodologies through the provided links and documentation.
Guide: Running Locally
- Download the Models: Use the
hf_hub_download
API to download models locally from Hugging Face. The models folder contains the generator necessary for TTS inference. - Model Checkpoints: Full model checkpoints, including discriminator and optimizer states, are available in the
full_models
folder. - Inference Instructions: Detailed instructions for running inferences can be found in the fairseq documentation.
For enhanced performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure to handle the computational demands of TTS tasks effectively.
License
The models and resources in this repository are provided under the CC-BY-NC 4.0 license, which allows for non-commercial use with appropriate credit.