x T T S v2 wolof

galsenai

Introduction

The XTTS-V2-WOLOF is a text-to-speech model designed to synthesize a voice speaking in Wolof from text input in the same language. The model is based on the xTTS V2 architecture and has been trained using the Wolof-TTS dataset curated by GalsenAI Lab.

Architecture

The model utilizes the xTTS V2 framework, which allows for efficient text-to-speech synthesis with support for language-specific adaptations. It leverages conditioning latents and speaker embeddings for generating synthetic voices.

Training

The model was trained using the Cleaned Wolof-TTS dataset, which includes natural pauses in recordings, potentially introducing pauses in synthesized speech. The training notebook and dataset preparation were facilitated by contributors from the GalsenAI community.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Clone the Repository:

    git clone https://github.com/Galsenaicommunity/Wolof-TTS.git
    
  2. Install Dependencies:

    cd Wolof-TTS/notebooks/Models/xTTS\ v2
    pip install -r requirements.txt
    
  3. Download and Unzip the Model Checkpoint:

    gdown <Checkpoint ID>
    unzip galsenai-xtts-wo-checkpoints.zip && rm galsenai-xtts-wo-checkpoints.zip
    
  4. Load the Model:

    import torch
    from TTS.tts.models.xtts import Xtts
    from TTS.tts.configs.xtts_config import XttsConfig
    
    device = "cuda:0" if torch.cuda.is_available() else "cpu"
    XTTS_MODEL.to(device)
    
  5. Generate Synthetic Voice:

    result = XTTS_MODEL.inference(text="Your text here", ...)
    
  6. Export the Audio:

    import soundfile as sf
    sf.write("generated_audio.wav", audio_signal, sample_rate)
    

For better performance, consider using a cloud GPU, such as Google Cloud's A100 40GB.

License

The model is subject to regulations governing personal data protection and Senegalese law. Users must ensure compliance when utilizing this model. GalsenAI disclaims liability for any misuse.

More Related APIs