tts_transformer tr cv7 LLM Model

Introduction

The TTS_TRANSFORMER-TR-CV7 is a text-to-speech model developed using the fairseq library. It is designed specifically for the Turkish language and provides a single-speaker male voice. The model is trained on the Common Voice v7 dataset.

Architecture

The model utilizes a transformer architecture for converting text to speech, as discussed in the fairseq S^2 paper. It leverages the fairseq toolkit, which is known for its scalability and integration capabilities in speech synthesis tasks.

Training

The model is trained using the Common Voice v7 dataset. This dataset provides a large collection of diverse voice recordings, enabling the model to generate Turkish speech with high fidelity.

Guide: Running Locally

To use the TTS_TRANSFORMER-TR-CV7 model locally, follow these steps:

Install fairseq: Ensure you have the fairseq library installed.
Load the Model: Use load_model_ensemble_and_task_from_hf_hub to load the model from Hugging Face's model hub.
Prepare the Text: Define the input text you want to convert to speech.
Generate Speech: Use TTSHubInterface to generate audio from the text.
Play the Audio: Use IPython.display.Audio to play the generated audio.

from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
import IPython.display as ipd

models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
    "facebook/tts_transformer-tr-cv7",
    arg_overrides={"vocoder": "hifigan", "fp16": False}
)
model = models[0]
TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg)
generator = task.build_generator(model, cfg)

text = "Merhaba, bu bir deneme çalışmasıdır."

sample = TTSHubInterface.get_model_input(task, text)
wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample)

ipd.Audio(wav, rate=rate)

Consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure for faster processing and inference.

License

The model and its usage are subject to the licensing terms specified by fairseq, Facebook AI Research, and Hugging Face. Please refer to their respective licensing documents for detailed information.

More Related APIs in Text To Speech