Oute T T S 0.2 500 M
OuteAIIntroduction
OuteTTS-0.2-500M is an advanced text-to-speech (TTS) model designed to produce natural and coherent speech. It builds on the Qwen-2.5-0.5B foundation, enhancing performance across various aspects, including multilingual support for English, Chinese, Japanese, and Korean.
Architecture
The model utilizes the Qwen-2.5-0.5B architecture, featuring 500 million parameters. It is designed to operate efficiently with a range of audio prompts, maintaining high accuracy in speech synthesis.
Training
OuteTTS-0.2-500M was trained on diverse datasets such as Emilia-Dataset, LibriTTS-R, and Multilingual LibriSpeech (MLS) to improve its accuracy, naturalness, and multilingual capabilities. The training process benefited from a GPU grant provided by Hugging Face.
Guide: Running Locally
-
Installation:
- Install the main package via pip:
pip install outetts --upgrade
- If using GGUF or EXL2 support, follow the respective installation guides:
- Install the main package via pip:
-
Configuration:
- Use the provided Python code to configure and initialize the model:
import outetts model_config = outetts.HFModelConfig_v1( model_path="OuteAI/OuteTTS-0.2-500M", language="en" ) interface = outetts.InterfaceHF(model_version="0.2", cfg=model_config)
- Use the provided Python code to configure and initialize the model:
-
Generate Speech:
- Load a speaker and generate speech with custom settings:
speaker = interface.load_default_speaker(name="male_1") output = interface.generate( text="Speech synthesis is the artificial production of human speech.", temperature=0.1, repetition_penalty=1.1, max_length=4096, speaker=speaker ) output.save("output.wav")
- Load a speaker and generate speech with custom settings:
-
Cloud GPUs:
- For better performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure to handle intensive computations.
License
OuteTTS-0.2-500M is distributed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY NC 4.0) license. This allows for sharing and adaptation with attribution, but not for commercial use.