dvoice kabyle
aioxlabsIntroduction
The DVOICE-Kabyle model, developed by AIOX Labs, provides tools for automatic speech recognition (ASR) using the Kabyle language. It is built on the CommonVoice dataset and utilizes the SpeechBrain framework. The model is designed to facilitate voice technology use in low-resource languages.
Architecture
The ASR system employs two main components:
- Tokenizer: A unigram tokenizer transforms words into subword units trained on transcriptions.
- Acoustic Model: A pretrained wav2vec 2.0 model (facebook/wav2vec2-large-xlsr-53) is fine-tuned with CTC (Connectionist Temporal Classification) for acoustic representation and decoding. It processes recordings at 16kHz and normalizes audio inputs automatically.
Training
Training details are not extensively covered, but instructions for training from scratch are available on the project's GitHub page. The model achieves validation CER (Character Error Rate) of 6.67 and WER (Word Error Rate) of 25.22.
Guide: Running Locally
-
Install Dependencies:
pip install speechbrain transformers
-
Transcribing Audio:
Use the following code to transcribe audio files:from speechbrain.pretrained import EncoderASR asr_model = EncoderASR.from_hparams(source="aioxlabs/dvoice-kabyle", savedir="pretrained_models/asr-wav2vec2-dvoice-wol") asr_model.transcribe_file('./the_path_to_your_audio_file')
-
Inference on GPU:
Addrun_opts={"device":"cuda"}
to leverage GPU resources:asr_model = EncoderASR.from_hparams(source="aioxlabs/dvoice-kabyle", savedir="pretrained_models/asr-wav2vec2-dvoice-wol", run_opts={"device":"cuda"})
-
Cloud GPUs:
For enhanced performance, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
The DVOICE-Kabyle model is distributed under the Apache-2.0 license, permitting free use, modification, and distribution.