lang id voxlingua107 ecapa
speechbrainIntroduction
The VOXLINGUA107 ECAPA-TDNN model is a spoken language identification system trained on the VoxLingua107 dataset using the SpeechBrain toolkit. It is designed to recognize and classify speech utterances into 107 different languages.
Architecture
The model utilizes the ECAPA-TDNN architecture, which is typically employed for speaker recognition. It incorporates additional fully connected hidden layers after the embedding layer to enhance performance in language identification tasks. The model is trained with recordings sampled at 16kHz and uses cross-entropy loss.
Training
The model is trained on the VoxLingua107 dataset, which comprises 6,628 hours of speech data from YouTube, covering 107 languages. The dataset was curated by labeling speech segments based on video titles and descriptions, with post-processing to filter false positives. The training procedure follows the SpeechBrain recipe, and the model achieves a 6.7% error rate on the VoxLingua107 development dataset.
Guide: Running Locally
-
Install SpeechBrain:
Use the command:pip install git+https://github.com/speechbrain/speechbrain.git@develop
-
Import Required Libraries:
import torchaudio from speechbrain.inference.classifiers import EncoderClassifier
-
Load and Use the Model:
language_id = EncoderClassifier.from_hparams(source="speechbrain/lang-id-voxlingua107-ecapa", savedir="tmp") signal = language_id.load_audio("speechbrain/lang-id-voxlingua107-ecapa/udhr_th.wav") prediction = language_id.classify_batch(signal) print(prediction)
-
Perform Inference with GPU: Add
run_opts={"device":"cuda"}
to thefrom_hparams
method for GPU support. -
Cloud GPU Suggestion: Consider using cloud services like AWS or Google Cloud for access to GPUs, which can speed up processing.
License
The model and associated content are licensed under the Apache-2.0 license.