asr wav2vec2 dvoice darija
speechbrainIntroduction
The ASR-WAV2VEC2-DVOICE-DARIJA project provides an automatic speech recognition (ASR) system for Darija, a Moroccan Arabic dialect. This model is built upon the SpeechBrain toolkit and leverages the wav2vec 2.0 model architecture combined with Connectionist Temporal Classification (CTC) for end-to-end ASR tasks.
Architecture
The ASR system utilizes a two-block pipeline:
- Tokenizer: A unigram tokenizer converts words into subword units using training transcriptions.
- Acoustic Model: Employs a pretrained wav2vec 2.0 model, enhanced with two deep neural network layers and finetuned on the Darija dataset. The final acoustic representation is processed by a CTC greedy decoder. The system operates on audio sampled at 16kHz, with automatic normalization applied during transcription.
Training
Training involves:
- Cloning the SpeechBrain repository.
- Installing dependencies and SpeechBrain.
- Running the training script using a specified YAML configuration and data folder.
Training results, including models and logs, are available online.
Guide: Running Locally
-
Install Dependencies:
Runpip install speechbrain transformers
to install necessary packages. -
Transcribe Audio:
Use the following code to transcribe an audio file:from speechbrain.inference.ASR import EncoderASR asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-dvoice-darija", savedir="pretrained_models/asr-wav2vec2-dvoice-darija") asr_model.transcribe_file('speechbrain/asr-wav2vec2-dvoice-darija/example_darija.wav')
-
Inference on GPU:
Addrun_opts={"device":"cuda"}
tofrom_hparams
for GPU support. -
Cloud GPUs:
Consider using cloud services like AWS or Google Cloud for GPU resources to enhance performance.
License
The project is released under the Apache 2.0 License, permitting open usage and modification with attribution.