wav2vec2 large xlsr turkish demo colab
patrickvonplatenIntroduction
The WAV2VEC2-LARGE-XLSR-TURKISH-DEMO-COLAB model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53
, trained on the Common Voice dataset for Automatic Speech Recognition (ASR) in Turkish. This model is designed to convert audio into text and demonstrates a Word Error Rate (WER) of 0.4800 on the evaluation set.
Architecture
The model is based on the Wav2Vec 2.0 architecture, which is designed to process audio data for speech recognition. It leverages a large-scale multilingual model, fine-tuned specifically for Turkish speech.
Training
Training Procedure
The model was fine-tuned using the Common Voice dataset. Key hyperparameters included:
- Learning rate: 0.0003
- Train batch size: 16
- Eval batch size: 8
- Seed: 42
- Gradient accumulation steps: 2
- Total train batch size: 32
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- LR scheduler type: Linear
- LR scheduler warmup steps: 500
- Num epochs: 30
- Mixed precision training: Native AMP
Training Results
The training achieved a final loss of 0.4055 and a WER of 0.4800. The training process observed gradual improvements in both training and validation losses across epochs.
Framework Versions
- Transformers: 4.11.3
- PyTorch: 1.9.1+cu102
- Datasets: 1.13.3
- Tokenizers: 0.10.3
Guide: Running Locally
To run this model locally, follow these steps:
-
Setup Environment: Install the required libraries including PyTorch, Transformers, and Datasets.
pip install torch transformers datasets
-
Load the Model: Use the Transformers library to load the model.
from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer model = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-turkish-demo-colab") tokenizer = Wav2Vec2Tokenizer.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-turkish-demo-colab")
-
Inference: Prepare audio input and perform inference to get transcriptions.
-
Consider Cloud GPUs: For efficient processing, consider using cloud GPU services such as AWS, Google Cloud, or Azure to handle large audio files.
License
This model is licensed under the Apache-2.0 License, allowing for both personal and commercial use with proper attribution.