lb de fr en pt coqui vits tts

mbarnig

Introduction

The LB-DE-FR-EN-PT-COQUI-VITS-TTS model is a text-to-speech (TTS) system developed by mbarnig. It supports five languages: Luxembourgish, German, French, English, and Portuguese. The model is part of the Coqui.ai ecosystem and is designed for audio synthesis applications.

Architecture

The model is based on the Coqui-TTS multilingual VITS model recipe, version 0.7.1. It was trained without using phonemes and utilizes a custom character set including special characters and punctuations for text processing.

Training

The model was trained from scratch using the customized mbarnig/lb-de-fr-en-pt-12800-TTS-CORPUS dataset. The training did not involve phoneme-based processing and relied on a character set for text input. Training metrics and progress can be viewed via a live TensorBoard demonstration.

Guide: Running Locally

  1. Clone the Repository: Start by cloning the repository to your local machine.
  2. Install Dependencies: Ensure all required dependencies from the Coqui-TTS library are installed.
  3. Download the Model: Obtain the model files from the Hugging Face model card.
  4. Run the Model: Use a Python script or command line to run the model, generating audio outputs from text inputs.

Suggested Cloud GPUs

For optimal performance, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs, Google Cloud Platform, or Azure for accelerated processing.

License

The model is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (cc-by-nc-sa-4.0). This allows for sharing and adaptation under the conditions of attribution, non-commercial use, and share-alike of the derived works.

More Related APIs