spkrec ecapa voxceleb
speechbrainIntroduction
This repository provides tools for speaker verification using a pretrained ECAPA-TDNN model from SpeechBrain. The model is trained on the VoxCeleb dataset and is capable of extracting speaker embeddings and performing verification tasks. It achieves an Equal Error Rate (EER) of 0.80% on the VoxCeleb1-test set.
Architecture
The system utilizes an ECAPA-TDNN model, which combines convolutional and residual blocks. Embeddings are extracted through attentive statistical pooling, and the model is trained using Additive Margin Softmax Loss. Speaker verification is accomplished by computing the cosine distance between speaker embeddings.
Training
The model was trained using SpeechBrain with recordings sampled at 16kHz. To train from scratch:
- Clone the SpeechBrain repository.
git clone https://github.com/speechbrain/speechbrain/
- Install the required packages.
cd speechbrain pip install -r requirements.txt pip install -e .
- Execute the training script.
cd recipes/VoxCeleb/SpeakerRec python train_speaker_embeddings.py hparams/train_ecapa_tdnn.yaml --data_folder=your_data_folder
Guide: Running Locally
-
Install SpeechBrain:
pip install git+https://github.com/speechbrain/speechbrain.git@develop
-
Compute Speaker Embeddings:
import torchaudio from speechbrain.inference.speaker import EncoderClassifier classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb") signal, fs = torchaudio.load('tests/samples/ASR/spk1_snt1.wav') embeddings = classifier.encode_batch(signal)
-
Perform Speaker Verification:
from speechbrain.inference.speaker import SpeakerRecognition verification = SpeakerRecognition.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb", savedir="pretrained_models/spkrec-ecapa-voxceleb") score, prediction = verification.verify_files("tests/samples/ASR/spk1_snt1.wav", "tests/samples/ASR/spk2_snt1.wav")
-
Inference on GPU: Add
run_opts={"device":"cuda"}
when callingfrom_hparams
.
Cloud GPUs
For optimal performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure for running computations.
License
This project is licensed under the Apache-2.0 License.