xls r uyghur cv7

lucio

XLS-R-UYGHUR-CV7 Model

Introduction

XLS-R-UYGHUR-CV7 is an automatic speech recognition (ASR) model fine-tuned for the Uyghur language. It is based on the facebook/wav2vec2-xls-r-300m model and utilizes the Mozilla Common Voice 7.0 dataset. The model is designed to transcribe Uyghur speech using the Perso-Arabic script.

Architecture

This model is a derivative of the facebook/wav2vec2-xls-r-300m architecture. It employs a frozen featurization layer with a trainable CTC/LM layer tailored for Uyghur speech recognition. The vocabulary is composed of Perso-Arabic script alphabetic characters without punctuation.

Training

The model was trained using the training and development splits of the Mozilla Common Voice 7.0 dataset, with the test split serving for validation and evaluation. The training involved freezing featurization layers and fine-tuning the CTC/LM layer over 100 epochs with a learning rate that peaks at 0.0001. Key hyperparameters include a batch size of 8, an Adam optimizer, and a linear learning rate scheduler. The model achieved a Word Error Rate (WER) of 25.845 and a Character Error Rate (CER) of 4.795 on the test set.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Required Libraries: Ensure you have transformers, pytorch, datasets, and tokenizers installed. Use the following command:

    pip install transformers torch datasets tokenizers
    
  2. Download the Model: Use the Hugging Face model hub to download xls-r-uyghur-cv7.

  3. Setup and Inference: Load the model and tokenizer in your Python environment to perform inference on Uyghur audio data.

  4. Cloud GPU Recommendation: For faster processing, consider using cloud-based GPU services such as AWS EC2 with GPU instances or Google Cloud's AI Platform.

License

The model is licensed under the Apache 2.0 License, allowing for wide usage and modification under specified conditions.

More Related APIs in Automatic Speech Recognition