spkrec xvect voxceleb

speechbrain

Introduction

SpeechBrain's SPKREC-XVECT-VOXCELEB is a pretrained model designed for speaker verification using xvector embeddings. It leverages a Time Delay Neural Network (TDNN) architecture and is trained on the VoxCeleb dataset, which includes Voxceleb1 and Voxceleb2. The model achieves an Equal Error Rate (EER) of 3.2% on the Voxceleb1-test set.

Architecture

The model utilizes a TDNN architecture combined with statistical pooling. It is trained using Categorical Cross-Entropy Loss, which is effective for tasks involving classification and identification. This architecture is particularly suited for extracting robust speaker embeddings from audio data.

Training

The training of the model was conducted using the SpeechBrain toolkit, specifically with recordings sampled at 16kHz. To train the model from scratch, users can clone the SpeechBrain repository and follow the instructions to install dependencies and execute the training script. The training results, including models and logs, are available online for reference.

Guide: Running Locally

  1. Install SpeechBrain:
    Run the following command to install SpeechBrain:

    pip install speechbrain
    
  2. Compute Speaker Embeddings:
    Use the following Python code to load an audio file and extract embeddings:

    import torchaudio
    from speechbrain.inference.speaker import EncoderClassifier
    
    classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb")
    signal, fs = torchaudio.load('your_audio_file.wav')
    embeddings = classifier.encode_batch(signal)
    
  3. Inference on GPU:
    To enable GPU support, modify the from_hparams call:

    classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb", run_opts={"device":"cuda"})
    
  4. Cloud GPUs:
    For large-scale or resource-intensive tasks, consider using cloud services such as AWS, Google Cloud, or Azure, which provide GPU instances.

License

The SPKREC-XVECT-VOXCELEB model is licensed under the Apache 2.0 License, which permits use, distribution, and modification under defined terms.

More Related APIs in Audio Classification