wav2vec2 large xlsr persian v3
m3hrdadfiIntroduction
This document provides an overview of the WAV2VEC2-LARGE-XLSR-PERSIAN-V3
model, which is designed for Automatic Speech Recognition (ASR) in the Persian language using the Wav2Vec2 architecture. It is fine-tuned from Facebook's wav2vec2-large-xlsr-53
model and utilizes the Common Voice dataset.
Architecture
The model leverages the Wav2Vec2
architecture, specifically the large variant wav2vec2-large-xlsr-53
, which is well-suited for cross-lingual speech recognition tasks. It is built using the Transformers library and supports integration with both PyTorch and TensorFlow.
Training
The model is fine-tuned on Persian speech data from the Common Voice dataset. The fine-tuning process involves adjusting the pre-trained Wav2Vec2 model to better recognize and transcribe Persian speech, achieving a Word Error Rate (WER) of 10.36% on the test set.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Required Packages:
pip install git+https://github.com/huggingface/datasets.git pip install git+https://github.com/huggingface/transformers.git pip install torchaudio librosa jiwer parsivar num2fawords
-
Download and Prepare Data:
Download the Common Voice dataset for Persian and extract it:wget https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-6.1-2020-12-11/fa.tar.gz tar -xzf fa.tar.gz rm -rf fa.tar.gz
-
Data Cleaning:
Use the provided normalizer script to clean the data:from normalizer import normalizer # Define a cleaning function and apply it to your dataset
-
Load and Prepare the Model:
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor model_name_or_path = "m3hrdadfi/wav2vec2-large-xlsr-persian-v3" processor = Wav2Vec2Processor.from_pretrained(model_name_or_path) model = Wav2Vec2ForCTC.from_pretrained(model_name_or_path).to(device)
-
Make Predictions:
Use the model to predict transcriptions from audio files. -
Evaluate the Model:
Calculate the WER to evaluate the performance of the model.
Cloud GPUs: Consider using cloud-based services like AWS, Google Cloud, or Azure for access to powerful GPUs suitable for model inference and training.
License
The model and its associated code are shared under a license that should be checked directly on its repository page for specific terms and conditions.