Oute T T S 0.2 500 M G G U F

OuteAI

Introduction

OuteTTS-0.2-500M is an advanced text-to-speech model developed by OuteAI, serving as an improved version of the v0.1 release. The model enhances prompt following, output coherence, and voice cloning capabilities. It supports multiple languages and generates natural speech synthesis.

Architecture

The model is built upon the Qwen-2.5-0.5B architecture with 500 million parameters. It supports English as its primary language, with experimental support for Chinese, Japanese, and Korean.

Training

The model was trained on a diverse array of datasets, including:

  • Emilia-Dataset (CC BY NC 4.0)
  • LibriTTS-R (CC BY 4.0)
  • Multilingual LibriSpeech (MLS) (CC BY 4.0)

These datasets contribute to the model's extensive vocabulary and enhanced performance in voice cloning and multilingual support.

Guide: Running Locally

  1. Installation:

    • Install OuteTTS via pip: pip install outetts --upgrade.
    • For GGUF support, install llama-cpp-python manually. Refer to the installation guide.
    • For EXL2 support, install exllamav2 manually. Refer to the installation guide.
  2. Usage:

    • Configure the model with outetts.HFModelConfig_v1.
    • Initialize the interface with outetts.InterfaceHF.
    • Load a default speaker and generate speech using the provided code snippet.
  3. Hardware Recommendations:

    • For optimal performance, consider using cloud GPUs, such as those from AWS, Google Cloud, or Azure, especially for large-scale or resource-intensive tasks.

License

OuteTTS-0.2-500M is released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY NC 4.0), permitting use for non-commercial purposes with appropriate credit.

More Related APIs in Text To Speech