wav2vec2 bloom speech snk

sil-ai

Introduction

The WAV2VEC2-BLOOM-SPEECH-SNK model is a fine-tuned version of the facebook/wav2vec2-xls-r-300m, specifically trained on the SIL-AI/bloom-speech dataset for the Soninke language. It is designed for automatic speech recognition tasks and achieves a Word Error Rate (WER) of 28.88% and a Character Error Rate (CER) of 5.76% on its evaluation set.

Architecture

The model is based on the Wav2Vec 2.0 architecture, which is pretrained and fine-tuned for speech recognition tasks. This specific version was created to work with the Soninke language using the Bloom Speech dataset.

Training

Training Data

The training, validation, and test datasets were derived from a single corpus, ensuring there were no duplicate files.

Training Procedure

The model underwent standard fine-tuning using the Hugging Face Transformers library. The procedure included the following hyperparameters:

  • Learning Rate: 0.0003
  • Training Batch Size: 16
  • Evaluation Batch Size: 8
  • Seed: 42
  • Gradient Accumulation Steps: 2
  • Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
  • Learning Rate Scheduler: Linear with warmup steps of 250
  • Number of Epochs: 1000
  • Mixed Precision Training: Native AMP

Training Results

Over the course of training, the model achieved the following:

  • Final Training Loss: 0.3255
  • Validation Loss: 0.2888
  • WER: 28.88%
  • CER: 5.76%

Guide: Running Locally

To run the WAV2VEC2-BLOOM-SPEECH-SNK model locally, follow these steps:

  1. Install Dependencies: Ensure you have the following libraries installed:

    • Transformers
    • PyTorch
    • Datasets
    • Tokenizers
  2. Set Up Environment:

    pip install transformers==4.21.0.dev0 torch==1.9.0+cu111 datasets==2.2.2 tokenizers==0.12.1
    
  3. Load the Model:

    from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
    model = Wav2Vec2ForCTC.from_pretrained('sil-ai/wav2vec2-bloom-speech-snk')
    processor = Wav2Vec2Processor.from_pretrained('sil-ai/wav2vec2-bloom-speech-snk')
    
  4. Inference: Prepare your audio files and run inference using the processor and model.

For optimal performance, it is recommended to use cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The WAV2VEC2-BLOOM-SPEECH-SNK model is available under the SIL International AI & NLP RAIL-M license. This license allows for non-commercial use only and prohibits the generation or sharing of illegal or harmful content. Redistribution of the model requires adherence to the same license terms. For commercial inquiries, contact the model authors via email. The full license can be reviewed here.

More Related APIs in Automatic Speech Recognition