x T T S v2 wolof
galsenaiIntroduction
The XTTS-V2-WOLOF is a text-to-speech model designed to synthesize a voice speaking in Wolof from text input in the same language. The model is based on the xTTS V2 architecture and has been trained using the Wolof-TTS dataset curated by GalsenAI Lab.
Architecture
The model utilizes the xTTS V2 framework, which allows for efficient text-to-speech synthesis with support for language-specific adaptations. It leverages conditioning latents and speaker embeddings for generating synthetic voices.
Training
The model was trained using the Cleaned Wolof-TTS dataset, which includes natural pauses in recordings, potentially introducing pauses in synthesized speech. The training notebook and dataset preparation were facilitated by contributors from the GalsenAI community.
Guide: Running Locally
To run the model locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/Galsenaicommunity/Wolof-TTS.git
-
Install Dependencies:
cd Wolof-TTS/notebooks/Models/xTTS\ v2 pip install -r requirements.txt
-
Download and Unzip the Model Checkpoint:
gdown <Checkpoint ID> unzip galsenai-xtts-wo-checkpoints.zip && rm galsenai-xtts-wo-checkpoints.zip
-
Load the Model:
import torch from TTS.tts.models.xtts import Xtts from TTS.tts.configs.xtts_config import XttsConfig device = "cuda:0" if torch.cuda.is_available() else "cpu" XTTS_MODEL.to(device)
-
Generate Synthetic Voice:
result = XTTS_MODEL.inference(text="Your text here", ...)
-
Export the Audio:
import soundfile as sf sf.write("generated_audio.wav", audio_signal, sample_rate)
For better performance, consider using a cloud GPU, such as Google Cloud's A100 40GB.
License
The model is subject to regulations governing personal data protection and Senegalese law. Users must ensure compliance when utilizing this model. GalsenAI disclaims liability for any misuse.