Introduction

Brouhaha is a model for joint voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation. It is part of the PyAnnote project utilizing pyannote.audio and brouhaha-vad. The model is designed to analyze audio files and provide insights into voice activity and acoustic conditions.

Architecture

Brouhaha leverages PyTorch for implementing neural networks, employing multi-task learning to simultaneously address voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation. The model utilizes datasets like LibriSpeech, AudioSet, EchoThief, and MIT-Acoustical-Reverberation-Scene to train and evaluate its performance.

Training

The model was trained using a multi-task training approach that allows it to handle different aspects of audio analysis simultaneously. This training method enhances the model's ability to provide detailed acoustic insights from audio data.

Guide: Running Locally

Basic Steps

  1. Installation

    • Install dependencies:
      pip install pyannote-audio
      pip install https://github.com/marianne-m/brouhaha-vad/archive/main.zip
      
  2. Setup

    • Visit hf.co/pyannote/brouhaha and accept user conditions.
    • Create an access token at hf.co/settings/tokens.
  3. Model Initialization

    • Use the following Python code to instantiate and apply the model:
      from pyannote.audio import Model, Inference
      
      model = Model.from_pretrained("pyannote/brouhaha", use_auth_token="ACCESS_TOKEN_GOES_HERE")
      inference = Inference(model)
      output = inference("audio.wav")
      
      for frame, (vad, snr, c50) in output:
          t = frame.middle
          print(f"{t:8.3f} vad={100*vad:.0f}% snr={snr:.0f} c50={c50:.0f}")
      
  4. Analysis

    • Each frame's output provides voice activity detection (VAD), speech-to-noise ratio (SNR), and C50 room acoustics estimates.

Cloud GPUs

Running Brouhaha on cloud GPU platforms like AWS, Google Cloud, or Azure can significantly enhance performance, especially for large-scale audio processing tasks.

License

Brouhaha is released under the OpenRAIL license, allowing for both personal and commercial use with certain conditions.

More Related APIs in Voice Activity Detection