wav2vec2 large xlsr 53 arabic
jonatasgrosmanIntroduction
The wav2vec2-large-xlsr-53-arabic
model is a fine-tuned version of Facebook's wav2vec2-large-xlsr-53
for Arabic speech recognition. It utilizes the Common Voice 6.1 and Arabic Speech Corpus datasets for training and validation. The model processes speech inputs sampled at 16kHz and is designed for automatic speech recognition (ASR) tasks. It was developed with support from OVHcloud's GPU credits.
Architecture
This model is built upon the Wav2Vec 2.0 architecture, specifically the wav2vec2-large-xlsr-53
variant, which is tailored for multilingual speech recognition tasks. The model outputs transcriptions based on the input audio, leveraging its pre-trained capabilities on a large corpus of diverse languages and fine-tuning specifically for Arabic.
Training
The model was fine-tuned on the Arabic language using the Common Voice and Arabic Speech Corpus datasets. The training script is available on GitHub, and the process involved adjusting the model to recognize and transcribe spoken Arabic with improved accuracy. Performance metrics include a Word Error Rate (WER) of 39.59% and a Character Error Rate (CER) of 18.18%.
Guide: Running Locally
Basic Steps
-
Install Required Libraries: Ensure you have
librosa
,torch
,transformers
, anddatasets
installed.pip install librosa torch transformers datasets
-
Load Pre-trained Model:
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor processor = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-arabic") model = Wav2Vec2ForCTC.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-arabic")
-
Prepare Audio Data: Load your audio files and ensure they are sampled at 16kHz.
import librosa speech_array, sampling_rate = librosa.load("path/to/audio.wav", sr=16_000)
-
Transcribe Audio:
inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True) with torch.no_grad(): logits = model(inputs.input_values).logits predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.batch_decode(predicted_ids)
Suggest Cloud GPUs
For efficient processing, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure. These platforms provide scalable resources ideal for handling large datasets and complex model inference tasks.
License
This model is licensed under the Apache 2.0 License, which permits use, distribution, and modification under defined conditions. Ensure compliance with the license terms when using the model in your projects.