artificial styletts2
dkounadisIntroduction
Artificial-StyleTTS2 is a text-to-audio model focused on generating audio outputs from text inputs. It includes tools for text-to-speech transformation and soundscape generation, allowing users to create complex audio-visual experiences.
Architecture
The model utilizes the text-to-audio
pipeline, leveraging libraries such as audiocraft
and techniques like styletts2
. It supports various audio generation methods, including soundscapes
and text-to-speech
, and can overlay audio on video files using landscape2soundscape.py
.
Training
The model and its associated tools are designed to incorporate emotional analysis in text-to-speech (TTS) outputs, creating affective TTS and soundscapes. It is equipped to handle multiple voices and adapt emotional tones in generated audio.
Guide: Running Locally
Basic Steps
-
Clone the Repository:
git clone https://huggingface.co/dkounadis/artificial-styletts2
-
Set Up the Environment:
virtualenv --python=python3 ~/.envs/.my_env source ~/.envs/.my_env/bin/activate cd artificial-styletts2/
-
Install Requirements:
pip install -r requirements.txt
-
Run Flask API:
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=2 python api.py
-
Generate Soundscapes: Ensure
api.py
is running, then execute:python landscape2soundscape.py
Cloud GPUs
For better performance, especially with large datasets or real-time applications, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
This project is licensed under the MIT License, allowing for flexibility in modification and redistribution.