wav2vec2 urdu
kingabzproIntroduction
WAV2VEC2-URDU is an Automatic Speech Recognition (ASR) model designed to transcribe Urdu language audio into text. The model is fine-tuned from the base model Harveenchadha/vakyansh-wav2vec2-urdu-urm-60
using the Common Voice dataset. The model is built using the Transformers library and is compatible with PyTorch.
Architecture
The model is a variant of the Wav2Vec2 architecture, specifically fine-tuned for the Urdu language. It leverages robust speech event capabilities and is evaluated using metrics such as Word Error Rate (WER) and Character Error Rate (CER).
Training
The model was trained on 0.58 hours of dataset data, utilizing the vakyansh-wav2vec2-urdu-urm-60
checkpoint due to the limited number of available samples. Key hyperparameters used during training include:
- Learning Rate: 0.0003
- Train Batch Size: 64
- Eval Batch Size: 8
- Optimizer: Adam with betas (0.9, 0.999) and epsilon 1e-08
- Gradient Accumulation Steps: 2
- Total Train Batch Size: 128
- Number of Epochs: 100
- Mixed Precision Training: Native AMP
Training results show a WER of 0.5747 and a CER of 0.3268.
Guide: Running Locally
To run the model locally, follow these steps:
-
Set Up Environment
Ensure you have Python installed, and set up a virtual environment. -
Install Dependencies
Install the necessary libraries:pip install transformers==4.16.0.dev0 pip install torch==1.10.1+cu102 pip install datasets==1.17.1.dev0 pip install tokenizers==0.11.0
-
Download Model
Use the Hugging Face Transformers library to download and load the model:from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer model = Wav2Vec2ForCTC.from_pretrained("kingabzpro/wav2vec2-urdu") tokenizer = Wav2Vec2Tokenizer.from_pretrained("kingabzpro/wav2vec2-urdu")
-
Inference
Run inference on your audio files to transcribe the speech into text.
For enhanced performance, it is recommended to use cloud GPUs such as AWS EC2 instances with NVIDIA GPUs or Google Cloud Platform's AI Platform.
License
The model is licensed under the Apache-2.0 license, allowing for both personal and commercial use with attribution.