asr wav2vec2 dvoice wolof
speechbrainIntroduction
The ASR-WAV2VEC2-DVOICE-WOLOF model is an automatic speech recognition (ASR) system built with SpeechBrain, leveraging a wav2vec 2.0 acoustic model and trained on the Dvoice Wolof dataset. It performs end-to-end speech recognition, converting spoken Wolof language into text using a tokenizer and CTC/attention mechanisms.
Architecture
The ASR system comprises two main components:
- Tokenizer: Uses a unigram model to convert words into subword units based on training transcriptions.
- Acoustic Model: Combines a pretrained wav2vec 2.0 model with two DNN layers, finetuned on the Darija dataset. The final acoustic representation is processed by a CTC greedy decoder. The system is optimized for audio sampled at 16kHz.
Training
The model is trained with SpeechBrain. The training pipeline involves:
- Cloning the SpeechBrain repository.
- Installing the necessary dependencies.
- Executing the training script with specified hyperparameters and data directories.
Guide: Running Locally
- Install Dependencies:
pip install speechbrain transformers
- Transcribe Audio Files:
Use the following Python code to transcribe an audio file:from speechbrain.inference.ASR import EncoderASR asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-dvoice-wolof", savedir="pretrained_models/asr-wav2vec2-dvoice-wolof") asr_model.transcribe_file('speechbrain/asr-wav2vec2-dvoice-wolof/example_wolof.wav')
- Inference on GPU:
To use a GPU for inference, modify the code:asr_model = EncoderASR.from_hparams( source="speechbrain/asr-wav2vec2-dvoice-wolof", savedir="pretrained_models/asr-wav2vec2-dvoice-wolof", run_opts={"device":"cuda"} )
Cloud GPUs
For enhanced performance, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure.
License
The ASR-WAV2VEC2-DVOICE-WOLOF model is released under the Apache-2.0 license, allowing for broad usage and modification with attribution.