dvoice kabyle

aioxlabs

Introduction

The DVOICE-Kabyle model, developed by AIOX Labs, provides tools for automatic speech recognition (ASR) using the Kabyle language. It is built on the CommonVoice dataset and utilizes the SpeechBrain framework. The model is designed to facilitate voice technology use in low-resource languages.

Architecture

The ASR system employs two main components:

  • Tokenizer: A unigram tokenizer transforms words into subword units trained on transcriptions.
  • Acoustic Model: A pretrained wav2vec 2.0 model (facebook/wav2vec2-large-xlsr-53) is fine-tuned with CTC (Connectionist Temporal Classification) for acoustic representation and decoding. It processes recordings at 16kHz and normalizes audio inputs automatically.

Training

Training details are not extensively covered, but instructions for training from scratch are available on the project's GitHub page. The model achieves validation CER (Character Error Rate) of 6.67 and WER (Word Error Rate) of 25.22.

Guide: Running Locally

  1. Install Dependencies:

    pip install speechbrain transformers
    
  2. Transcribing Audio:
    Use the following code to transcribe audio files:

    from speechbrain.pretrained import EncoderASR
    asr_model = EncoderASR.from_hparams(source="aioxlabs/dvoice-kabyle", savedir="pretrained_models/asr-wav2vec2-dvoice-wol")
    asr_model.transcribe_file('./the_path_to_your_audio_file')
    
  3. Inference on GPU:
    Add run_opts={"device":"cuda"} to leverage GPU resources:

    asr_model = EncoderASR.from_hparams(source="aioxlabs/dvoice-kabyle", savedir="pretrained_models/asr-wav2vec2-dvoice-wol", run_opts={"device":"cuda"})
    
  4. Cloud GPUs:
    For enhanced performance, consider using cloud GPU services like AWS, Google Cloud, or Azure.

License

The DVOICE-Kabyle model is distributed under the Apache-2.0 license, permitting free use, modification, and distribution.

More Related APIs in Automatic Speech Recognition