Open Voice V2
myshell-aiIntroduction
OpenVoice V2, developed by MyShell AI, is an advanced text-to-speech model that improves upon its predecessor, OpenVoice V1. Released in April 2024, it features enhanced audio quality, native support for multiple languages, and is available for free commercial use under the MIT License. The model excels in accurate tone color cloning, flexible voice style control, and zero-shot cross-lingual voice cloning.
Architecture
OpenVoice V2 is designed to deliver superior audio quality through an improved training strategy. It supports English, Spanish, French, Chinese, Japanese, and Korean natively, allowing for accurate tone color cloning and control over voice styles, including emotion, accent, rhythm, pauses, and intonation.
Training
The model uses a massive-speaker multilingual training dataset, allowing for zero-shot cross-lingual voice cloning. This means that neither the language of the generated speech nor the reference speech must be present in the training data.
Guide: Running Locally
Basic Steps
-
Linux Installation: This method is primarily for developers and researchers familiar with Linux, Python, and PyTorch.
- Create and activate a conda environment.
- Clone the OpenVoice repository.
- Install the package using pip.
- Download and extract the relevant checkpoints for V1 or V2.
conda create -n openvoice python=3.9 conda activate openvoice git clone git@github.com:myshell-ai/OpenVoice.git cd OpenVoice pip install -e .
-
OpenVoice V1:
- Download checkpoints and refer to demonstration notebooks for flexible style control and cross-lingual voice cloning.
-
OpenVoice V2:
- Download V2 checkpoints.
- Install MeloTTS and necessary language models.
- Refer to the demo notebook for usage.
pip install git+https://github.com/myshell-ai/MeloTTS.git python -m unidic download
-
Other Platforms:
- Windows and Docker installation guides are available, contributed by the community.
Cloud GPUs
For enhanced performance, especially with large datasets and models, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
OpenVoice V2 is distributed under the MIT License, allowing for free commercial use.