wav2vec2 xls r 300m cv8 turkish
mpoyrazWAV2VEC2-XLS-R-300M-CV8-TURKISH
Introduction
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m
for Turkish automatic speech recognition (ASR). It leverages the Common Voice 8.0 dataset and is evaluated on various sets to ensure robustness in ASR tasks.
Architecture
The model architecture is based on the wav2vec2
framework, which is designed for robust ASR. It is optimized for processing and recognizing Turkish language audio data.
Training
Training and Evaluation Data
- Datasets Used: The model is fine-tuned on the Common Voice 8.0 Turkish dataset, using all validated splits except the test split.
- Training Procedure: Custom preprocessing and dataset loading were performed using the
wav2vec2-turkish
repository. - Training Hyperparameters:
- Learning Rate:
2.5e-4
- Number of Training Epochs:
20
- Warmup Steps:
500
- Batch Size:
8
(for both training and evaluation) - Gradient Accumulation Steps:
8
- Various Dropout Rates:
0.05
to0.1
- Learning Rate:
- Framework Versions:
- Transformers:
4.17.0.dev0
- PyTorch:
1.10.1
- Datasets:
1.17.0
- Tokenizers:
0.10.3
- Transformers:
Language Model
An N-gram language model was trained on Turkish Wikipedia articles using KenLM, with resources from the ngram-lm-wiki
repository to generate and convert the language model.
Evaluation
- Commands: Install
unicode_tr
for Turkish text processing before evaluation. - Evaluation Results:
- Common Voice 8 TR test split: WER
10.61
, CER2.67
- Speech Recognition Community dev data: WER
36.46
, CER12.38
- Speech Recognition Community test data: WER
40.91
- Common Voice 8 TR test split: WER
Guide: Running Locally
- Setup Environment: Ensure all dependencies are installed, including PyTorch and Transformers.
- Install
unicode_tr
: Necessary for Turkish text processing. - Run Evaluation:
- For Common Voice 8.0:
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv8-turkish --dataset mozilla-foundation/common_voice_8_0 --config tr --split test
- For Speech Recognition Community dev data:
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv8-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0
- For Common Voice 8.0:
- Cloud GPUs: Consider using cloud-based GPU services like AWS EC2, Google Cloud, or Azure for accelerated processing.
License
This model is licensed under the Apache 2.0 License.