wav2vec2 large xlsr 53 dutch

jonatasgrosman

Introduction

The wav2vec2-large-xlsr-53-dutch model is a fine-tuned version of the wav2vec2-large-xlsr-53 model for automatic speech recognition (ASR) in Dutch, developed by Jonatas Grosman. This model leverages the Common Voice 6.1 dataset and is optimized for recognizing Dutch speech.

Architecture

This model is based on the wav2vec2 architecture, a self-supervised learning framework for speech processing. The model uses PyTorch and supports inference through the Hugging Face Transformers library. It has been fine-tuned to improve its performance on Dutch language datasets.

Training

The model was fine-tuned on the Dutch subsets of Common Voice 6.1 and CSS10 datasets. The training process involved GPU resources provided by OVHcloud, utilizing a sampling rate of 16kHz for speech input. The training script is publicly available on GitHub.

Guide: Running Locally

To run this model locally, you can use the HuggingSound library or write your own inference script. Below are general steps for using this model:

  1. Installation: Ensure you have Python and the necessary packages installed. Use pip to install Hugging Face Transformers and other dependencies.
  2. Model Loading: Load the model using the SpeechRecognitionModel class from the HuggingSound library or directly with Hugging Face Transformers.
  3. Preprocessing: Convert audio files to the required format and sampling rate.
  4. Inference: Transcribe audio files into text using the model.

Cloud GPUs

For efficient processing, consider using cloud GPU services like those provided by OVHcloud or other providers to handle intensive computations.

License

The wav2vec2-large-xlsr-53-dutch model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.

More Related APIs in Automatic Speech Recognition