Chat T T S

2Noise

Introduction

ChatTTS by 2Noise is a text-to-audio model designed to convert text inputs into audio outputs. It is available under a non-commercial license and supports features like speaker specification and speech speed adjustment.

Architecture

The model leverages the ChatTTS library and Torchaudio for audio processing. It supports batching and can be configured to improve performance by compiling models during loading.

Training

2Noise is engaged in training larger-scale models and seeks computational resources and data support. Interested parties can contact them via their open-source email for collaboration opportunities.

Guide: Running Locally

To run ChatTTS locally, follow these steps:

  1. Clone the Repository

    git clone https://github.com/2noise/ChatTTS.git
    
  2. Model Inference

    • Import necessary libraries.
    • Initialize and load the model with ChatTTS.Chat() and chat.load_models(compile=False).
    • Define text input and perform inference.
    • Save the generated audio using Torchaudio.
  3. Cloud GPUs Recommendation
    For enhanced performance, consider using cloud GPUs like AWS EC2, Google Cloud Platform, or Azure.

License

ChatTTS is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (cc-by-nc-4.0). The model is intended for academic purposes only, and commercial use is not permitted.

More Related APIs in Text To Audio