brouhaha
pyannoteIntroduction
Brouhaha is a model for joint voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation. It is part of the PyAnnote project utilizing pyannote.audio
and brouhaha-vad
. The model is designed to analyze audio files and provide insights into voice activity and acoustic conditions.
Architecture
Brouhaha leverages PyTorch for implementing neural networks, employing multi-task learning to simultaneously address voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation. The model utilizes datasets like LibriSpeech, AudioSet, EchoThief, and MIT-Acoustical-Reverberation-Scene to train and evaluate its performance.
Training
The model was trained using a multi-task training approach that allows it to handle different aspects of audio analysis simultaneously. This training method enhances the model's ability to provide detailed acoustic insights from audio data.
Guide: Running Locally
Basic Steps
-
Installation
- Install dependencies:
pip install pyannote-audio pip install https://github.com/marianne-m/brouhaha-vad/archive/main.zip
- Install dependencies:
-
Setup
- Visit
hf.co/pyannote/brouhaha
and accept user conditions. - Create an access token at
hf.co/settings/tokens
.
- Visit
-
Model Initialization
- Use the following Python code to instantiate and apply the model:
from pyannote.audio import Model, Inference model = Model.from_pretrained("pyannote/brouhaha", use_auth_token="ACCESS_TOKEN_GOES_HERE") inference = Inference(model) output = inference("audio.wav") for frame, (vad, snr, c50) in output: t = frame.middle print(f"{t:8.3f} vad={100*vad:.0f}% snr={snr:.0f} c50={c50:.0f}")
- Use the following Python code to instantiate and apply the model:
-
Analysis
- Each frame's output provides voice activity detection (VAD), speech-to-noise ratio (SNR), and C50 room acoustics estimates.
Cloud GPUs
Running Brouhaha on cloud GPU platforms like AWS, Google Cloud, or Azure can significantly enhance performance, especially for large-scale audio processing tasks.
License
Brouhaha is released under the OpenRAIL license, allowing for both personal and commercial use with certain conditions.