wav2vec2 xls r 300m cv8 turkish

mpoyraz

WAV2VEC2-XLS-R-300M-CV8-TURKISH

Introduction

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m for Turkish automatic speech recognition (ASR). It leverages the Common Voice 8.0 dataset and is evaluated on various sets to ensure robustness in ASR tasks.

Architecture

The model architecture is based on the wav2vec2 framework, which is designed for robust ASR. It is optimized for processing and recognizing Turkish language audio data.

Training

Training and Evaluation Data

  • Datasets Used: The model is fine-tuned on the Common Voice 8.0 Turkish dataset, using all validated splits except the test split.
  • Training Procedure: Custom preprocessing and dataset loading were performed using the wav2vec2-turkish repository.
  • Training Hyperparameters:
    • Learning Rate: 2.5e-4
    • Number of Training Epochs: 20
    • Warmup Steps: 500
    • Batch Size: 8 (for both training and evaluation)
    • Gradient Accumulation Steps: 8
    • Various Dropout Rates: 0.05 to 0.1
  • Framework Versions:
    • Transformers: 4.17.0.dev0
    • PyTorch: 1.10.1
    • Datasets: 1.17.0
    • Tokenizers: 0.10.3

Language Model

An N-gram language model was trained on Turkish Wikipedia articles using KenLM, with resources from the ngram-lm-wiki repository to generate and convert the language model.

Evaluation

  • Commands: Install unicode_tr for Turkish text processing before evaluation.
  • Evaluation Results:
    • Common Voice 8 TR test split: WER 10.61, CER 2.67
    • Speech Recognition Community dev data: WER 36.46, CER 12.38
    • Speech Recognition Community test data: WER 40.91

Guide: Running Locally

  1. Setup Environment: Ensure all dependencies are installed, including PyTorch and Transformers.
  2. Install unicode_tr: Necessary for Turkish text processing.
  3. Run Evaluation:
    • For Common Voice 8.0:
      python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv8-turkish --dataset mozilla-foundation/common_voice_8_0 --config tr --split test
      
    • For Speech Recognition Community dev data:
      python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv8-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0
      
  4. Cloud GPUs: Consider using cloud-based GPU services like AWS EC2, Google Cloud, or Azure for accelerated processing.

License

This model is licensed under the Apache 2.0 License.

More Related APIs in Automatic Speech Recognition