wav2vec2 large xlsr 53 dutch
jonatasgrosmanIntroduction
The wav2vec2-large-xlsr-53-dutch
model is a fine-tuned version of the wav2vec2-large-xlsr-53
model for automatic speech recognition (ASR) in Dutch, developed by Jonatas Grosman. This model leverages the Common Voice 6.1 dataset and is optimized for recognizing Dutch speech.
Architecture
This model is based on the wav2vec2
architecture, a self-supervised learning framework for speech processing. The model uses PyTorch and supports inference through the Hugging Face Transformers library. It has been fine-tuned to improve its performance on Dutch language datasets.
Training
The model was fine-tuned on the Dutch subsets of Common Voice 6.1 and CSS10 datasets. The training process involved GPU resources provided by OVHcloud, utilizing a sampling rate of 16kHz for speech input. The training script is publicly available on GitHub.
Guide: Running Locally
To run this model locally, you can use the HuggingSound library or write your own inference script. Below are general steps for using this model:
- Installation: Ensure you have Python and the necessary packages installed. Use
pip
to install Hugging Face Transformers and other dependencies. - Model Loading: Load the model using the
SpeechRecognitionModel
class from the HuggingSound library or directly with Hugging Face Transformers. - Preprocessing: Convert audio files to the required format and sampling rate.
- Inference: Transcribe audio files into text using the model.
Cloud GPUs
For efficient processing, consider using cloud GPU services like those provided by OVHcloud or other providers to handle intensive computations.
License
The wav2vec2-large-xlsr-53-dutch
model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.