Oute T T S 0.2 500 M G G U F
OuteAIIntroduction
OuteTTS-0.2-500M is an advanced text-to-speech model developed by OuteAI, serving as an improved version of the v0.1 release. The model enhances prompt following, output coherence, and voice cloning capabilities. It supports multiple languages and generates natural speech synthesis.
Architecture
The model is built upon the Qwen-2.5-0.5B architecture with 500 million parameters. It supports English as its primary language, with experimental support for Chinese, Japanese, and Korean.
Training
The model was trained on a diverse array of datasets, including:
- Emilia-Dataset (CC BY NC 4.0)
- LibriTTS-R (CC BY 4.0)
- Multilingual LibriSpeech (MLS) (CC BY 4.0)
These datasets contribute to the model's extensive vocabulary and enhanced performance in voice cloning and multilingual support.
Guide: Running Locally
-
Installation:
- Install OuteTTS via pip:
pip install outetts --upgrade
. - For GGUF support, install
llama-cpp-python
manually. Refer to the installation guide. - For EXL2 support, install
exllamav2
manually. Refer to the installation guide.
- Install OuteTTS via pip:
-
Usage:
- Configure the model with
outetts.HFModelConfig_v1
. - Initialize the interface with
outetts.InterfaceHF
. - Load a default speaker and generate speech using the provided code snippet.
- Configure the model with
-
Hardware Recommendations:
- For optimal performance, consider using cloud GPUs, such as those from AWS, Google Cloud, or Azure, especially for large-scale or resource-intensive tasks.
License
OuteTTS-0.2-500M is released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY NC 4.0), permitting use for non-commercial purposes with appropriate credit.