sepformer wsj02mix
speechbrainIntroduction
The SepFormer model, implemented with SpeechBrain, is designed for audio source separation and is pretrained on the WSJ0-2Mix dataset. It achieves a performance of 22.4 dB SI-SNRi on the test set. This repository provides tools to perform source separation, with examples available for listening.
Architecture
The SepFormer utilizes a transformer-based architecture to perform speech separation tasks. It is specifically optimized for audio-to-audio applications, focusing on separating mixed audio sources. The model leverages the WSJ0-2Mix dataset for training, which is a benchmark for such tasks.
Training
Training the SepFormer model involves using SpeechBrain, a general-purpose speech toolkit. The training steps are outlined as follows:
-
Clone the SpeechBrain repository:
git clone https://github.com/speechbrain/speechbrain/
-
Install the necessary dependencies:
cd speechbrain pip install -r requirements.txt pip install -e .
-
Run the training script:
cd recipes/WSJ0Mix/separation python train.py hparams/sepformer.yaml --data_folder=your_data_folder
Training results, including models and logs, can be accessed here.
Guide: Running Locally
-
Install SpeechBrain:
pip install speechbrain
-
Perform Source Separation:
from speechbrain.inference.separation import SepformerSeparation as separator import torchaudio model = separator.from_hparams(source="speechbrain/sepformer-wsj02mix", savedir='pretrained_models/sepformer-wsj02mix') est_sources = model.separate_file(path='speechbrain/sepformer-wsj02mix/test_mixture.wav') torchaudio.save("source1hat.wav", est_sources[:, :, 0].detach().cpu(), 8000) torchaudio.save("source2hat.wav", est_sources[:, :, 1].detach().cpu(), 8000)
Ensure your input recordings are sampled at 8kHz.
-
Inference on GPU: To execute on a GPU, use
run_opts={"device":"cuda"}
when callingfrom_hparams
.
Suggested Cloud GPUs
For optimal performance, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs, Google Cloud's GPU offerings, or Azure's GPU instances.
License
The SepFormer model is licensed under the Apache-2.0 License, allowing for broad use and modification.