sepformer whamr
speechbrainIntroduction
SepFormer-WHAMR is a model for audio source separation, implemented using the SpeechBrain toolkit. It is trained on the WHAMR! dataset, a variant of the WSJ0-Mix dataset, incorporating environmental noise and reverberation. The model achieves a performance of 13.7 dB SI-SNRi on the WHAMR! test set.
Architecture
The SepFormer model is based on a transformer architecture designed for speech separation tasks. It utilizes attention mechanisms to effectively separate audio sources in a mixture, handling complex audio environments.
Training
The model was trained using the SpeechBrain framework. To train the model from scratch:
- Clone the SpeechBrain repository:
git clone https://github.com/speechbrain/speechbrain/
- Install the necessary dependencies:
cd speechbrain pip install -r requirements.txt pip install -e .
- Start the training process:
cd recipes/WHAMandWHAMR/separation python train.py hparams/sepformer-whamr.yaml --data_folder=YOUR_DATA_FOLDER --rir_path=YOUR_ROOM_IMPULSE_SAVE_PATH
Guide: Running Locally
-
Install SpeechBrain:
Use the command:pip install speechbrain
-
Perform Source Separation:
Use the following Python script to separate audio sources:from speechbrain.inference.separation import SepformerSeparation as separator import torchaudio model = separator.from_hparams(source="speechbrain/sepformer-whamr", savedir='pretrained_models/sepformer-whamr') est_sources = model.separate_file(path='speechbrain/sepformer-wsj02mix/test_mixture.wav') torchaudio.save("source1hat.wav", est_sources[:, :, 0].detach().cpu(), 8000) torchaudio.save("source2hat.wav", est_sources[:, :, 1].detach().cpu(), 8000)
-
Inference on GPU:
To utilize a GPU, modify the model loading as follows:model = separator.from_hparams(source="speechbrain/sepformer-whamr", savedir='pretrained_models/sepformer-whamr', run_opts={"device":"cuda"})
-
Suggestion for Cloud GPUs:
Consider using cloud-based GPU services like AWS, Google Cloud, or Azure for more efficient processing.
License
The SepFormer-WHAMR model is licensed under the Apache-2.0 License.