sepformer wsj02mix

speechbrain

Introduction

The SepFormer model, implemented with SpeechBrain, is designed for audio source separation and is pretrained on the WSJ0-2Mix dataset. It achieves a performance of 22.4 dB SI-SNRi on the test set. This repository provides tools to perform source separation, with examples available for listening.

Architecture

The SepFormer utilizes a transformer-based architecture to perform speech separation tasks. It is specifically optimized for audio-to-audio applications, focusing on separating mixed audio sources. The model leverages the WSJ0-2Mix dataset for training, which is a benchmark for such tasks.

Training

Training the SepFormer model involves using SpeechBrain, a general-purpose speech toolkit. The training steps are outlined as follows:

  1. Clone the SpeechBrain repository:

    git clone https://github.com/speechbrain/speechbrain/
    
  2. Install the necessary dependencies:

    cd speechbrain
    pip install -r requirements.txt
    pip install -e .
    
  3. Run the training script:

    cd recipes/WSJ0Mix/separation
    python train.py hparams/sepformer.yaml --data_folder=your_data_folder
    

Training results, including models and logs, can be accessed here.

Guide: Running Locally

  1. Install SpeechBrain:

    pip install speechbrain
    
  2. Perform Source Separation:

    from speechbrain.inference.separation import SepformerSeparation as separator
    import torchaudio
    
    model = separator.from_hparams(source="speechbrain/sepformer-wsj02mix", savedir='pretrained_models/sepformer-wsj02mix')
    est_sources = model.separate_file(path='speechbrain/sepformer-wsj02mix/test_mixture.wav')
    torchaudio.save("source1hat.wav", est_sources[:, :, 0].detach().cpu(), 8000)
    torchaudio.save("source2hat.wav", est_sources[:, :, 1].detach().cpu(), 8000)
    

    Ensure your input recordings are sampled at 8kHz.

  3. Inference on GPU: To execute on a GPU, use run_opts={"device":"cuda"} when calling from_hparams.

Suggested Cloud GPUs

For optimal performance, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs, Google Cloud's GPU offerings, or Azure's GPU instances.

License

The SepFormer model is licensed under the Apache-2.0 License, allowing for broad use and modification.

More Related APIs in Audio To Audio