asr wav2vec2 dvoice darija

speechbrain

Introduction

The ASR-WAV2VEC2-DVOICE-DARIJA project provides an automatic speech recognition (ASR) system for Darija, a Moroccan Arabic dialect. This model is built upon the SpeechBrain toolkit and leverages the wav2vec 2.0 model architecture combined with Connectionist Temporal Classification (CTC) for end-to-end ASR tasks.

Architecture

The ASR system utilizes a two-block pipeline:

  • Tokenizer: A unigram tokenizer converts words into subword units using training transcriptions.
  • Acoustic Model: Employs a pretrained wav2vec 2.0 model, enhanced with two deep neural network layers and finetuned on the Darija dataset. The final acoustic representation is processed by a CTC greedy decoder. The system operates on audio sampled at 16kHz, with automatic normalization applied during transcription.

Training

Training involves:

  1. Cloning the SpeechBrain repository.
  2. Installing dependencies and SpeechBrain.
  3. Running the training script using a specified YAML configuration and data folder.

Training results, including models and logs, are available online.

Guide: Running Locally

  1. Install Dependencies:
    Run pip install speechbrain transformers to install necessary packages.

  2. Transcribe Audio:
    Use the following code to transcribe an audio file:

    from speechbrain.inference.ASR import EncoderASR
    asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-dvoice-darija", savedir="pretrained_models/asr-wav2vec2-dvoice-darija")
    asr_model.transcribe_file('speechbrain/asr-wav2vec2-dvoice-darija/example_darija.wav')
    
  3. Inference on GPU:
    Add run_opts={"device":"cuda"} to from_hparams for GPU support.

  4. Cloud GPUs:
    Consider using cloud services like AWS or Google Cloud for GPU resources to enhance performance.

License

The project is released under the Apache 2.0 License, permitting open usage and modification with attribution.

More Related APIs in Automatic Speech Recognition