sepformer wham
speechbrainIntroduction
The SepFormer model, implemented with SpeechBrain, is designed for audio source separation tasks. It is pretrained on the WHAM! dataset, an enhanced version of the WSJ0-Mix dataset with environmental noise. The model achieves a performance of 16.3 dB SI-SNRi on the WHAM! test set.
Architecture
SepFormer utilizes a transformer-based architecture for efficient speech separation. The model is trained to separate audio sources from a mixed input, leveraging the SpeechBrain toolkit.
Training
To train the SepFormer model from scratch, follow these steps:
-
Clone the SpeechBrain Repository:
git clone https://github.com/speechbrain/speechbrain/
-
Install Dependencies:
cd speechbrain pip install -r requirements.txt pip install -e .
-
Run Training:
Navigate to the training directory and execute the training script:
cd recipes/WHAMandWHAMR/separation python train.py hparams/sepformer-wham.yaml --data_folder=your_data_folder
Training artifacts, including models and logs, are available here.
Guide: Running Locally
-
Install SpeechBrain:
pip install speechbrain
-
Perform Source Separation:
Use the following Python script:
from speechbrain.inference.separation import SepformerSeparation as separator import torchaudio model = separator.from_hparams(source="speechbrain/sepformer-wham", savedir='pretrained_models/sepformer-wham') est_sources = model.separate_file(path='speechbrain/sepformer-wsj02mix/test_mixture.wav') torchaudio.save("source1hat.wav", est_sources[:, :, 0].detach().cpu(), 8000) torchaudio.save("source2hat.wav", est_sources[:, :, 1].detach().cpu(), 8000)
Ensure the input audio is sampled at 8 kHz. Resample if necessary using
torchaudio
orsox
. -
Inference on GPU:
To utilize a GPU, specify the device in the
from_hparams
method:model = separator.from_hparams(source="speechbrain/sepformer-wham", savedir='pretrained_models/sepformer-wham', run_opts={"device":"cuda"})
-
Cloud GPUs:
Consider using cloud services like AWS, GCP, or Azure for GPU resources to enhance computation efficiency.
License
The SepFormer model is released under the Apache 2.0 License.