asr wav2vec2 dvoice wolof LLM Model

Introduction

The ASR-WAV2VEC2-DVOICE-WOLOF model is an automatic speech recognition (ASR) system built with SpeechBrain, leveraging a wav2vec 2.0 acoustic model and trained on the Dvoice Wolof dataset. It performs end-to-end speech recognition, converting spoken Wolof language into text using a tokenizer and CTC/attention mechanisms.

Architecture

The ASR system comprises two main components:

Tokenizer: Uses a unigram model to convert words into subword units based on training transcriptions.
Acoustic Model: Combines a pretrained wav2vec 2.0 model with two DNN layers, finetuned on the Darija dataset. The final acoustic representation is processed by a CTC greedy decoder. The system is optimized for audio sampled at 16kHz.

Training

The model is trained with SpeechBrain. The training pipeline involves:

Cloning the SpeechBrain repository.
Installing the necessary dependencies.
Executing the training script with specified hyperparameters and data directories.

Guide: Running Locally

Install Dependencies:
```
pip install speechbrain transformers
```

Transcribe Audio Files:
Use the following Python code to transcribe an audio file:

from speechbrain.inference.ASR import EncoderASR
asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-dvoice-wolof", savedir="pretrained_models/asr-wav2vec2-dvoice-wolof")
asr_model.transcribe_file('speechbrain/asr-wav2vec2-dvoice-wolof/example_wolof.wav')

Inference on GPU:
To use a GPU for inference, modify the code:

asr_model = EncoderASR.from_hparams(
    source="speechbrain/asr-wav2vec2-dvoice-wolof",
    savedir="pretrained_models/asr-wav2vec2-dvoice-wolof",
    run_opts={"device":"cuda"}
)

Cloud GPUs

For enhanced performance, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure.

License

The ASR-WAV2VEC2-DVOICE-WOLOF model is released under the Apache-2.0 license, allowing for broad usage and modification with attribution.

More Related APIs in Automatic Speech Recognition