wav2vec2 bloom speech bam

sil-ai

Introduction

The WAV2VEC2-BLOOM-SPEECH-BAM model is a fine-tuned version of facebook/wav2vec2-xls-r-300m, specifically adapted to the Bambara language dataset from SIL-AI/bloom-speech. It is designed for automatic speech recognition tasks and is developed as a proof of concept under the SIL RAIL-M License, intended for non-commercial use.

Architecture

This model leverages the wav2vec2 architecture, known for its application in speech recognition tasks. The model has been fine-tuned using the Bloom Speech BAM dataset to recognize and transcribe Bambara language speech. The architecture is based on the Hugging Face Transformers library, utilizing PyTorch as the core deep learning framework.

Training

The model was trained using standard finetuning procedures for the XLS-R architecture. Key hyperparameters included a learning rate of 0.0003, a train batch size of 16, and an eval batch size of 8. The Adam optimizer with specific betas and epsilon values was used, and training spanned 1000 epochs with mixed precision training enabled. The model achieved a Word Error Rate (WER) of 35.9% and a Character Error Rate (CER) of 10.93% on the test set.

Guide: Running Locally

  1. Install Dependencies: Ensure you have Python and the necessary libraries installed, such as Transformers, PyTorch, and Datasets.
  2. Clone the Repository: Download or clone the model repository from Hugging Face.
  3. Load the Model: Use the Transformers library to load the model and tokenizer.
  4. Run Inference: Provide audio inputs to the model to obtain transcriptions.
  5. Hardware Recommendations: A cloud GPU, such as those available from AWS, GCP, or Azure, is recommended for efficient processing.

License

The WAV2VEC2-BLOOM-SPEECH-BAM model is released under the SIL RAIL-M License. It is available for non-commercial use, and redistribution requires adherence to the same licensing terms. The license prohibits harmful use, particularly against Indigenous Peoples, and requires sharing the license with users if redistributed. For commercial inquiries, contact SIL AI.

More Related APIs in Automatic Speech Recognition