wav2vec2 large xlsr turkish demo colab LLM Model

Introduction

The WAV2VEC2-LARGE-XLSR-TURKISH-DEMO-COLAB model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53, trained on the Common Voice dataset for Automatic Speech Recognition (ASR) in Turkish. This model is designed to convert audio into text and demonstrates a Word Error Rate (WER) of 0.4800 on the evaluation set.

Architecture

The model is based on the Wav2Vec 2.0 architecture, which is designed to process audio data for speech recognition. It leverages a large-scale multilingual model, fine-tuned specifically for Turkish speech.

Training

Training Procedure

The model was fine-tuned using the Common Voice dataset. Key hyperparameters included:

Learning rate: 0.0003
Train batch size: 16
Eval batch size: 8
Seed: 42
Gradient accumulation steps: 2
Total train batch size: 32
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
LR scheduler type: Linear
LR scheduler warmup steps: 500
Num epochs: 30
Mixed precision training: Native AMP

Training Results

The training achieved a final loss of 0.4055 and a WER of 0.4800. The training process observed gradual improvements in both training and validation losses across epochs.

Framework Versions

Transformers: 4.11.3
PyTorch: 1.9.1+cu102
Datasets: 1.13.3
Tokenizers: 0.10.3

Guide: Running Locally

To run this model locally, follow these steps:

Setup Environment: Install the required libraries including PyTorch, Transformers, and Datasets.
```
pip install torch transformers datasets
```

Load the Model: Use the Transformers library to load the model.

from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer

model = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-turkish-demo-colab")
tokenizer = Wav2Vec2Tokenizer.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-turkish-demo-colab")

Inference: Prepare audio input and perform inference to get transcriptions.
Consider Cloud GPUs: For efficient processing, consider using cloud GPU services such as AWS, Google Cloud, or Azure to handle large audio files.

License

This model is licensed under the Apache-2.0 License, allowing for both personal and commercial use with proper attribution.

More Related APIs in Automatic Speech Recognition