spkrec xvect voxceleb
speechbrainIntroduction
SpeechBrain's SPKREC-XVECT-VOXCELEB is a pretrained model designed for speaker verification using xvector embeddings. It leverages a Time Delay Neural Network (TDNN) architecture and is trained on the VoxCeleb dataset, which includes Voxceleb1 and Voxceleb2. The model achieves an Equal Error Rate (EER) of 3.2% on the Voxceleb1-test set.
Architecture
The model utilizes a TDNN architecture combined with statistical pooling. It is trained using Categorical Cross-Entropy Loss, which is effective for tasks involving classification and identification. This architecture is particularly suited for extracting robust speaker embeddings from audio data.
Training
The training of the model was conducted using the SpeechBrain toolkit, specifically with recordings sampled at 16kHz. To train the model from scratch, users can clone the SpeechBrain repository and follow the instructions to install dependencies and execute the training script. The training results, including models and logs, are available online for reference.
Guide: Running Locally
-
Install SpeechBrain:
Run the following command to install SpeechBrain:pip install speechbrain
-
Compute Speaker Embeddings:
Use the following Python code to load an audio file and extract embeddings:import torchaudio from speechbrain.inference.speaker import EncoderClassifier classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb") signal, fs = torchaudio.load('your_audio_file.wav') embeddings = classifier.encode_batch(signal)
-
Inference on GPU:
To enable GPU support, modify thefrom_hparams
call:classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb", run_opts={"device":"cuda"})
-
Cloud GPUs:
For large-scale or resource-intensive tasks, consider using cloud services such as AWS, Google Cloud, or Azure, which provide GPU instances.
License
The SPKREC-XVECT-VOXCELEB model is licensed under the Apache 2.0 License, which permits use, distribution, and modification under defined terms.