hubert large arabic egyptian

omarxadel

Introduction

The Arabic Hubert-Large model is a fine-tuned automatic speech recognition model optimized for Egyptian Arabic. It was derived from the original Hubert-Large model, which was initially pre-trained on 2,000 hours of Arabic speech. Fine-tuning was performed using the MGB-3 and Egyptian Arabic Conversational Speech Corpus datasets.

Architecture

The model employs a Transformer architecture with CTC (Connectionist Temporal Classification) and Attention mechanisms. It is built using the PyTorch library and is compatible with the Safetensors format.

Training

Fine-tuning involved the MGB-3 and Egyptian Arabic Conversational Speech Corpus datasets. The model achieved a Word Error Rate (WER) of 25.9% during testing and 23.5% during validation, representing a state-of-the-art performance for Egyptian Arabic speech recognition.

Guide: Running Locally

  1. Prerequisites: Ensure you have Python and PyTorch installed.
  2. Install Transformers Library:
    pip install transformers
    
  3. Load the Model: Use Hugging Face’s Transformers library to load the model.
  4. Prepare Input: Ensure the speech input is sampled at 16kHz.
  5. Inference: Run the model on your local data for automatic speech recognition tasks.

For optimal performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The model is released under the CC-BY-NC-4.0 license, allowing for non-commercial use with attribution.

More Related APIs in Automatic Speech Recognition