wav2vec2 xls r 300m cs 250
comodoroIntroduction
The Czech Wav2Vec2-XLS-R-300M-CS-250 model is a fine-tuned version of the facebook/wav2vec2-xls-r-300m
model, optimized for Czech automatic speech recognition tasks. It utilizes the Common Voice 8.0 dataset and other Czech datasets for enhanced performance.
Architecture
This model is based on the Wav2Vec2 architecture, using the XLS-R approach, which is known for its robust speech recognition capabilities. It has been specifically fine-tuned for Czech using the following datasets: Common Voice 8.0, OVM, PSCR, and Vystadial 2016.
Training
The model was trained with several hyperparameters including a learning rate of 0.0001, a train batch size of 32, and an evaluation batch size of 8, using the Adam optimizer. Training involved multiple datasets, achieving a Word Error Rate (WER) of 7.3 and a Character Error Rate (CER) of 2.1 on the Common Voice 8.0 test set.
Guide: Running Locally
-
Install Dependencies: Ensure you have Python and necessary libraries installed:
pip install torch torchaudio transformers datasets
-
Load the Model and Processor:
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor processor = Wav2Vec2Processor.from_pretrained("comodoro/wav2vec2-xls-r-300m-cs-250") model = Wav2Vec2ForCTC.from_pretrained("comodoro/wav2vec2-xls-r-300m-cs-250")
-
Preprocess and Inference:
- Load a Czech speech dataset sampled at 16kHz.
- Use the processor to prepare inputs.
- Predict using the model and decode predictions.
-
Evaluation:
- Use
eval.py
to evaluate the model's performance on your dataset. - Command:
python eval.py --model_id comodoro/wav2vec2-xls-r-300m-cs-250 --dataset mozilla-foundation/common-voice_8_0 --split test --config cs
- Use
-
Suggested Cloud GPUs: Consider using cloud services like AWS EC2, Google Cloud, or Azure for GPUs to speed up processing.
License
The model is available under the Apache 2.0 License, allowing for both personal and commercial use with appropriate attribution.