sepformer whamr

speechbrain

Introduction

SepFormer-WHAMR is a model for audio source separation, implemented using the SpeechBrain toolkit. It is trained on the WHAMR! dataset, a variant of the WSJ0-Mix dataset, incorporating environmental noise and reverberation. The model achieves a performance of 13.7 dB SI-SNRi on the WHAMR! test set.

Architecture

The SepFormer model is based on a transformer architecture designed for speech separation tasks. It utilizes attention mechanisms to effectively separate audio sources in a mixture, handling complex audio environments.

Training

The model was trained using the SpeechBrain framework. To train the model from scratch:

  1. Clone the SpeechBrain repository:
    git clone https://github.com/speechbrain/speechbrain/
    
  2. Install the necessary dependencies:
    cd speechbrain
    pip install -r requirements.txt
    pip install -e .
    
  3. Start the training process:
    cd recipes/WHAMandWHAMR/separation
    python train.py hparams/sepformer-whamr.yaml --data_folder=YOUR_DATA_FOLDER --rir_path=YOUR_ROOM_IMPULSE_SAVE_PATH
    

Guide: Running Locally

  1. Install SpeechBrain:
    Use the command:

    pip install speechbrain
    
  2. Perform Source Separation:
    Use the following Python script to separate audio sources:

    from speechbrain.inference.separation import SepformerSeparation as separator
    import torchaudio
    
    model = separator.from_hparams(source="speechbrain/sepformer-whamr", savedir='pretrained_models/sepformer-whamr')
    
    est_sources = model.separate_file(path='speechbrain/sepformer-wsj02mix/test_mixture.wav') 
    
    torchaudio.save("source1hat.wav", est_sources[:, :, 0].detach().cpu(), 8000)
    torchaudio.save("source2hat.wav", est_sources[:, :, 1].detach().cpu(), 8000)
    
  3. Inference on GPU:
    To utilize a GPU, modify the model loading as follows:

    model = separator.from_hparams(source="speechbrain/sepformer-whamr", savedir='pretrained_models/sepformer-whamr', run_opts={"device":"cuda"})
    
  4. Suggestion for Cloud GPUs:
    Consider using cloud-based GPU services like AWS, Google Cloud, or Azure for more efficient processing.

License

The SepFormer-WHAMR model is licensed under the Apache-2.0 License.

More Related APIs in Audio To Audio