Cosy Voice 300 M Instruct
FunAudioLLMIntroduction
CosyVoice-300M-Instruct is a model developed by FunAudioLLM designed for audio processing with capabilities for zero-shot, cross-lingual, SFT (Supervised Fine-Tuning), and instruct inference. It is part of the CosyVoice suite, offering advanced features for generating and manipulating voice data.
Architecture
CosyVoice utilizes a multi-faceted architecture to support various inference methodologies, including zero-shot, cross-lingual, and SFT. It integrates components from several other projects, such as FunASR, FunCodec, and Matcha-TTS, to enhance its functionality and performance in voice processing tasks.
Training
The CosyVoice models, including CosyVoice-300M, CosyVoice-300M-SFT, and CosyVoice-300M-Instruct, are pretrained and available for download. Users interested in training from scratch are advised to follow the provided examples and scripts, which guide them through the training process using the provided resources and dependencies.
Guide: Running Locally
-
Clone the Repository
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git cd CosyVoice git submodule update --init --recursive
-
Set Up Environment
- Install Conda: Miniconda Installation Guide
- Create and activate the environment:
conda create -n cosyvoice python=3.8 conda activate cosyvoice conda install -y -c conda-forge pynini==2.1.5 pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
-
Install Additional Dependencies
- For Ubuntu:
sudo apt-get install sox libsox-dev
- For CentOS:
sudo yum install sox sox-devel
- For Ubuntu:
-
Download Pretrained Models
from modelscope import snapshot_download snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M') # Repeat for other models as needed
-
Run Basic Usage and Inference
- Use the
CosyVoice
class for different inference modes:from cosyvoice.cli.cosyvoice import CosyVoice cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-SFT') # Example SFT inference
- Use the
-
Cloud GPUs
Consider using cloud GPU services like AWS, GCP, or Azure for intensive processing tasks to ensure efficient model training and inference.
License
This project is intended for academic purposes and technical demonstration. It includes code adapted from several open-source projects, and users should adhere to the respective licenses of these projects. The content should not infringe on rights; for any concerns, contact the project maintainers.