Chat T T S
2NoiseIntroduction
ChatTTS by 2Noise is a text-to-audio model designed to convert text inputs into audio outputs. It is available under a non-commercial license and supports features like speaker specification and speech speed adjustment.
Architecture
The model leverages the ChatTTS library and Torchaudio for audio processing. It supports batching and can be configured to improve performance by compiling models during loading.
Training
2Noise is engaged in training larger-scale models and seeks computational resources and data support. Interested parties can contact them via their open-source email for collaboration opportunities.
Guide: Running Locally
To run ChatTTS locally, follow these steps:
-
Clone the Repository
git clone https://github.com/2noise/ChatTTS.git
-
Model Inference
- Import necessary libraries.
- Initialize and load the model with
ChatTTS.Chat()
andchat.load_models(compile=False)
. - Define text input and perform inference.
- Save the generated audio using Torchaudio.
-
Cloud GPUs Recommendation
For enhanced performance, consider using cloud GPUs like AWS EC2, Google Cloud Platform, or Azure.
License
ChatTTS is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (cc-by-nc-4.0). The model is intended for academic purposes only, and commercial use is not permitted.