artificial styletts2

dkounadis

Introduction

Artificial-StyleTTS2 is a text-to-audio model focused on generating audio outputs from text inputs. It includes tools for text-to-speech transformation and soundscape generation, allowing users to create complex audio-visual experiences.

Architecture

The model utilizes the text-to-audio pipeline, leveraging libraries such as audiocraft and techniques like styletts2. It supports various audio generation methods, including soundscapes and text-to-speech, and can overlay audio on video files using landscape2soundscape.py.

Training

The model and its associated tools are designed to incorporate emotional analysis in text-to-speech (TTS) outputs, creating affective TTS and soundscapes. It is equipped to handle multiple voices and adapt emotional tones in generated audio.

Guide: Running Locally

Basic Steps

  1. Clone the Repository:

    git clone https://huggingface.co/dkounadis/artificial-styletts2
    
  2. Set Up the Environment:

    virtualenv --python=python3 ~/.envs/.my_env
    source ~/.envs/.my_env/bin/activate
    cd artificial-styletts2/
    
  3. Install Requirements:

    pip install -r requirements.txt
    
  4. Run Flask API:

    CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=2 python api.py
    
  5. Generate Soundscapes: Ensure api.py is running, then execute:

    python landscape2soundscape.py
    

Cloud GPUs

For better performance, especially with large datasets or real-time applications, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

This project is licensed under the MIT License, allowing for flexibility in modification and redistribution.

More Related APIs in Text To Audio