wav2vec2 bloom speech snk LLM Model

Introduction

The WAV2VEC2-BLOOM-SPEECH-SNK model is a fine-tuned version of the facebook/wav2vec2-xls-r-300m, specifically trained on the SIL-AI/bloom-speech dataset for the Soninke language. It is designed for automatic speech recognition tasks and achieves a Word Error Rate (WER) of 28.88% and a Character Error Rate (CER) of 5.76% on its evaluation set.

Architecture

The model is based on the Wav2Vec 2.0 architecture, which is pretrained and fine-tuned for speech recognition tasks. This specific version was created to work with the Soninke language using the Bloom Speech dataset.

Training

Training Data

The training, validation, and test datasets were derived from a single corpus, ensuring there were no duplicate files.

Training Procedure

The model underwent standard fine-tuning using the Hugging Face Transformers library. The procedure included the following hyperparameters:

Learning Rate: 0.0003
Training Batch Size: 16
Evaluation Batch Size: 8
Seed: 42
Gradient Accumulation Steps: 2
Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
Learning Rate Scheduler: Linear with warmup steps of 250
Number of Epochs: 1000
Mixed Precision Training: Native AMP

Training Results

Over the course of training, the model achieved the following:

Final Training Loss: 0.3255
Validation Loss: 0.2888
WER: 28.88%
CER: 5.76%

Guide: Running Locally

To run the WAV2VEC2-BLOOM-SPEECH-SNK model locally, follow these steps:

Install Dependencies: Ensure you have the following libraries installed:
- Transformers
- PyTorch
- Datasets
- Tokenizers

Set Up Environment:

pip install transformers==4.21.0.dev0 torch==1.9.0+cu111 datasets==2.2.2 tokenizers==0.12.1

Load the Model:

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
model = Wav2Vec2ForCTC.from_pretrained('sil-ai/wav2vec2-bloom-speech-snk')
processor = Wav2Vec2Processor.from_pretrained('sil-ai/wav2vec2-bloom-speech-snk')

Inference: Prepare your audio files and run inference using the processor and model.

For optimal performance, it is recommended to use cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The WAV2VEC2-BLOOM-SPEECH-SNK model is available under the SIL International AI & NLP RAIL-M license. This license allows for non-commercial use only and prohibits the generation or sharing of illegal or harmful content. Redistribution of the model requires adherence to the same license terms. For commercial inquiries, contact the model authors via email. The full license can be reviewed here.

More Related APIs in Automatic Speech Recognition