wav2vec2 large xlsr 53 spanish

jonatasgrosman

Introduction

The wav2vec2-large-xlsr-53-spanish model is a fine-tuned version of Facebook's Wav2Vec2 model for automatic speech recognition (ASR) in Spanish. It was trained using the Spanish datasets of Common Voice 6.1 and is designed to transcribe speech inputs sampled at 16 kHz.

Architecture

This model is based on the Wav2Vec2 architecture, specifically the wav2vec2-large-xlsr-53 variant. It leverages the extensive training data provided by the Common Voice project, focusing on enhancing ASR capabilities for the Spanish language.

Training

The model was fine-tuned on Spanish using the train and validation splits of Common Voice 6.1. Training was supported by GPU credits from OVHcloud. The training script is available on GitHub: wav2vec2-sprint. The model's performance is measured using Word Error Rate (WER) and Character Error Rate (CER), with notable results on both test and development datasets.

Guide: Running Locally

Basic Steps

  1. Install Required Libraries: Ensure you have the transformers, datasets, and librosa libraries installed.
  2. Load the Model and Processor:
    from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
    processor = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-spanish")
    model = Wav2Vec2ForCTC.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-spanish")
    
  3. Prepare Audio Data: Load your audio files and ensure they are sampled at 16 kHz.
  4. Transcribe Audio: Use the model to predict text from audio inputs.

Cloud GPUs

For efficient processing and training, consider using cloud GPU services such as those offered by AWS, Google Cloud, or OVHcloud.

License

This model is licensed under the Apache 2.0 License, which allows for both personal and commercial use.

More Related APIs in Automatic Speech Recognition