xls r uyghur cv7
lucioXLS-R-UYGHUR-CV7 Model
Introduction
XLS-R-UYGHUR-CV7 is an automatic speech recognition (ASR) model fine-tuned for the Uyghur language. It is based on the facebook/wav2vec2-xls-r-300m
model and utilizes the Mozilla Common Voice 7.0 dataset. The model is designed to transcribe Uyghur speech using the Perso-Arabic script.
Architecture
This model is a derivative of the facebook/wav2vec2-xls-r-300m
architecture. It employs a frozen featurization layer with a trainable CTC/LM layer tailored for Uyghur speech recognition. The vocabulary is composed of Perso-Arabic script alphabetic characters without punctuation.
Training
The model was trained using the training and development splits of the Mozilla Common Voice 7.0 dataset, with the test split serving for validation and evaluation. The training involved freezing featurization layers and fine-tuning the CTC/LM layer over 100 epochs with a learning rate that peaks at 0.0001. Key hyperparameters include a batch size of 8, an Adam optimizer, and a linear learning rate scheduler. The model achieved a Word Error Rate (WER) of 25.845 and a Character Error Rate (CER) of 4.795 on the test set.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Required Libraries: Ensure you have
transformers
,pytorch
,datasets
, andtokenizers
installed. Use the following command:pip install transformers torch datasets tokenizers
-
Download the Model: Use the Hugging Face model hub to download
xls-r-uyghur-cv7
. -
Setup and Inference: Load the model and tokenizer in your Python environment to perform inference on Uyghur audio data.
-
Cloud GPU Recommendation: For faster processing, consider using cloud-based GPU services such as AWS EC2 with GPU instances or Google Cloud's AI Platform.
License
The model is licensed under the Apache 2.0 License, allowing for wide usage and modification under specified conditions.