metricgan plus voicebank
speechbrainIntroduction
MetricGAN+ is a model developed for speech enhancement using the SpeechBrain toolkit. It is designed to improve the quality of speech by reducing noise, and it is compatible with PyTorch. The model is trained on the Voicebank and DEMAND datasets and is evaluated using the PESQ and STOI metrics.
Architecture
MetricGAN+ employs a GAN-based architecture tailored for speech enhancement tasks. It uses a mimic-loss-trained model to enhance the audio quality by applying spectral mask enhancement techniques.
Training
The model was trained using the SpeechBrain toolkit. The training process involves cloning the SpeechBrain repository, installing necessary dependencies, and running the training script with appropriate hyperparameter settings. The system is trained on recordings sampled at 16kHz.
Guide: Running Locally
-
Install SpeechBrain:
pip install speechbrain
-
Pretrained Model Usage:
Use the provided code snippet to load and enhance audio files:import torch import torchaudio from speechbrain.inference.enhancement import SpectralMaskEnhancement enhance_model = SpectralMaskEnhancement.from_hparams( source="speechbrain/metricgan-plus-voicebank", savedir="pretrained_models/metricgan-plus-voicebank", ) noisy = enhance_model.load_audio("example.wav").unsqueeze(0) enhanced = enhance_model.enhance_batch(noisy, lengths=torch.tensor([1.])) torchaudio.save('enhanced.wav', enhanced.cpu(), 16000)
-
Inference on GPU:
Add the optionrun_opts={"device":"cuda"}
to utilize GPU for inference. -
Suggested Cloud GPUs:
Consider using cloud services like AWS, Google Cloud, or Azure for access to powerful GPUs to enhance performance during training and inference.
License
MetricGAN+ is licensed under the Apache 2.0 License.