wav2vec2 xls r 300m cv7 turkish

mpoyraz

Introduction

The WAV2VEC2-XLS-R-300M-CV7-TURKISH model is a fine-tuned Automatic Speech Recognition (ASR) model developed for the Turkish language. It is based on Facebook's wav2vec2-xls-r-300m model and utilizes datasets such as Common Voice 7.0 and MediaSpeech for training.

Architecture

The model is built on the wav2vec2 architecture, which is known for its robust performance in speech recognition tasks. The architecture employs a feature extractor that processes raw audio into a more manageable form for downstream tasks.

Training

Training and Evaluation Data

  • Datasets Used:
    • Common Voice 7.0 TR (all validated splits except the test split)
    • MediaSpeech
  • Custom pre-processing and data loading were necessary for the training process, facilitated by the wav2vec2-turkish repository.

Training Procedure and Hyperparameters

  • Learning Rate: 2e-4
  • Epochs: 10
  • Warmup Steps: 500
  • Batch Size: 8 (both for training and evaluation)
  • Gradient Accumulation Steps: 8
  • Dropout Rates:
    • Feature Projection: 0.05
    • Attention: 0.05
    • Final: 0.05
    • Activation: 0.05
  • Time and Feature Masking Probability: 0.1 and 0.05, respectively

Framework Versions

  • Transformers: 4.16.0.dev0
  • PyTorch: 1.10.1
  • Datasets: 1.17.0
  • Tokenizers: 0.10.3

Language Model

An N-gram language model was trained on Turkish Wikipedia articles using KenLM, with resources from the ngram-lm-wiki repository.

Evaluation

The model was evaluated on different datasets, yielding the following metrics:

  • Common Voice 7 TR Test Split:
    • WER: 8.62
    • CER: 2.26
  • Speech Recognition Community Dev Data:
    • WER: 30.87
    • CER: 10.69

Guide: Running Locally

  1. Setup: Ensure you have Python installed along with the necessary libraries such as PyTorch, Transformers, and Datasets.
  2. Install Dependencies:
    pip install transformers==4.16.0.dev0 torch==1.10.1 datasets==1.17.0 tokenizers==0.10.3
    pip install unicode_tr
    
  3. Clone the Repository: Clone the wav2vec2-turkish repository for custom data processing.
  4. Evaluation: Use the provided evaluation commands to assess model performance on the desired dataset.
  5. Hardware: It is recommended to use a cloud GPU service such as AWS, GCP, or Azure for efficient processing and evaluation.

License

This model and its associated resources are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

More Related APIs in Automatic Speech Recognition