wav2vec2 xls r 300m cs 250 LLM Model

Introduction

The Czech Wav2Vec2-XLS-R-300M-CS-250 model is a fine-tuned version of the facebook/wav2vec2-xls-r-300m model, optimized for Czech automatic speech recognition tasks. It utilizes the Common Voice 8.0 dataset and other Czech datasets for enhanced performance.

Architecture

This model is based on the Wav2Vec2 architecture, using the XLS-R approach, which is known for its robust speech recognition capabilities. It has been specifically fine-tuned for Czech using the following datasets: Common Voice 8.0, OVM, PSCR, and Vystadial 2016.

Training

The model was trained with several hyperparameters including a learning rate of 0.0001, a train batch size of 32, and an evaluation batch size of 8, using the Adam optimizer. Training involved multiple datasets, achieving a Word Error Rate (WER) of 7.3 and a Character Error Rate (CER) of 2.1 on the Common Voice 8.0 test set.

Guide: Running Locally

Install Dependencies: Ensure you have Python and necessary libraries installed:
```
pip install torch torchaudio transformers datasets
```

Load the Model and Processor:

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
processor = Wav2Vec2Processor.from_pretrained("comodoro/wav2vec2-xls-r-300m-cs-250")
model = Wav2Vec2ForCTC.from_pretrained("comodoro/wav2vec2-xls-r-300m-cs-250")

Preprocess and Inference:
- Load a Czech speech dataset sampled at 16kHz.
- Use the processor to prepare inputs.
- Predict using the model and decode predictions.

Evaluation:

Use eval.py to evaluate the model's performance on your dataset.

Command:

python eval.py --model_id comodoro/wav2vec2-xls-r-300m-cs-250 --dataset mozilla-foundation/common-voice_8_0 --split test --config cs

Suggested Cloud GPUs: Consider using cloud services like AWS EC2, Google Cloud, or Azure for GPUs to speed up processing.

License

The model is available under the Apache 2.0 License, allowing for both personal and commercial use with appropriate attribution.

More Related APIs in Automatic Speech Recognition