sepformer wham

speechbrain

Introduction

The SepFormer model, implemented with SpeechBrain, is designed for audio source separation tasks. It is pretrained on the WHAM! dataset, an enhanced version of the WSJ0-Mix dataset with environmental noise. The model achieves a performance of 16.3 dB SI-SNRi on the WHAM! test set.

Architecture

SepFormer utilizes a transformer-based architecture for efficient speech separation. The model is trained to separate audio sources from a mixed input, leveraging the SpeechBrain toolkit.

Training

To train the SepFormer model from scratch, follow these steps:

  1. Clone the SpeechBrain Repository:

    git clone https://github.com/speechbrain/speechbrain/
    
  2. Install Dependencies:

    cd speechbrain
    pip install -r requirements.txt
    pip install -e .
    
  3. Run Training:

    Navigate to the training directory and execute the training script:

    cd recipes/WHAMandWHAMR/separation
    python train.py hparams/sepformer-wham.yaml --data_folder=your_data_folder
    

Training artifacts, including models and logs, are available here.

Guide: Running Locally

  1. Install SpeechBrain:

    pip install speechbrain
    
  2. Perform Source Separation:

    Use the following Python script:

    from speechbrain.inference.separation import SepformerSeparation as separator
    import torchaudio
    
    model = separator.from_hparams(source="speechbrain/sepformer-wham", savedir='pretrained_models/sepformer-wham')
    
    est_sources = model.separate_file(path='speechbrain/sepformer-wsj02mix/test_mixture.wav') 
    
    torchaudio.save("source1hat.wav", est_sources[:, :, 0].detach().cpu(), 8000)
    torchaudio.save("source2hat.wav", est_sources[:, :, 1].detach().cpu(), 8000)
    

    Ensure the input audio is sampled at 8 kHz. Resample if necessary using torchaudio or sox.

  3. Inference on GPU:

    To utilize a GPU, specify the device in the from_hparams method:

    model = separator.from_hparams(source="speechbrain/sepformer-wham", savedir='pretrained_models/sepformer-wham', run_opts={"device":"cuda"})
    
  4. Cloud GPUs:

    Consider using cloud services like AWS, GCP, or Azure for GPU resources to enhance computation efficiency.

License

The SepFormer model is released under the Apache 2.0 License.

More Related APIs in Audio To Audio