wav2vec2 large xlsr 53 spanish
jonatasgrosmanIntroduction
The wav2vec2-large-xlsr-53-spanish
model is a fine-tuned version of Facebook's Wav2Vec2 model for automatic speech recognition (ASR) in Spanish. It was trained using the Spanish datasets of Common Voice 6.1 and is designed to transcribe speech inputs sampled at 16 kHz.
Architecture
This model is based on the Wav2Vec2 architecture, specifically the wav2vec2-large-xlsr-53
variant. It leverages the extensive training data provided by the Common Voice project, focusing on enhancing ASR capabilities for the Spanish language.
Training
The model was fine-tuned on Spanish using the train and validation splits of Common Voice 6.1. Training was supported by GPU credits from OVHcloud. The training script is available on GitHub: wav2vec2-sprint. The model's performance is measured using Word Error Rate (WER) and Character Error Rate (CER), with notable results on both test and development datasets.
Guide: Running Locally
Basic Steps
- Install Required Libraries: Ensure you have the
transformers
,datasets
, andlibrosa
libraries installed. - Load the Model and Processor:
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor processor = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-spanish") model = Wav2Vec2ForCTC.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-spanish")
- Prepare Audio Data: Load your audio files and ensure they are sampled at 16 kHz.
- Transcribe Audio: Use the model to predict text from audio inputs.
Cloud GPUs
For efficient processing and training, consider using cloud GPU services such as those offered by AWS, Google Cloud, or OVHcloud.
License
This model is licensed under the Apache 2.0 License, which allows for both personal and commercial use.