asr wav2vec2 dvoice wolof

speechbrain

Introduction

The ASR-WAV2VEC2-DVOICE-WOLOF model is an automatic speech recognition (ASR) system built with SpeechBrain, leveraging a wav2vec 2.0 acoustic model and trained on the Dvoice Wolof dataset. It performs end-to-end speech recognition, converting spoken Wolof language into text using a tokenizer and CTC/attention mechanisms.

Architecture

The ASR system comprises two main components:

  • Tokenizer: Uses a unigram model to convert words into subword units based on training transcriptions.
  • Acoustic Model: Combines a pretrained wav2vec 2.0 model with two DNN layers, finetuned on the Darija dataset. The final acoustic representation is processed by a CTC greedy decoder. The system is optimized for audio sampled at 16kHz.

Training

The model is trained with SpeechBrain. The training pipeline involves:

  1. Cloning the SpeechBrain repository.
  2. Installing the necessary dependencies.
  3. Executing the training script with specified hyperparameters and data directories.

Guide: Running Locally

  1. Install Dependencies:
    pip install speechbrain transformers
    
  2. Transcribe Audio Files:
    Use the following Python code to transcribe an audio file:
    from speechbrain.inference.ASR import EncoderASR
    asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-dvoice-wolof", savedir="pretrained_models/asr-wav2vec2-dvoice-wolof")
    asr_model.transcribe_file('speechbrain/asr-wav2vec2-dvoice-wolof/example_wolof.wav')
    
  3. Inference on GPU:
    To use a GPU for inference, modify the code:
    asr_model = EncoderASR.from_hparams(
        source="speechbrain/asr-wav2vec2-dvoice-wolof",
        savedir="pretrained_models/asr-wav2vec2-dvoice-wolof",
        run_opts={"device":"cuda"}
    )
    

Cloud GPUs

For enhanced performance, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure.

License

The ASR-WAV2VEC2-DVOICE-WOLOF model is released under the Apache-2.0 license, allowing for broad usage and modification with attribution.

More Related APIs in Automatic Speech Recognition