wav2vec2 large xlsr 53 spanish
facebookIntroduction
The wav2vec2-large-xlsr-53-spanish
model is a pretrained automatic speech recognition (ASR) model from Facebook AI, designed for processing Spanish language audio. It employs the Wav2Vec 2.0 architecture and is trained on the Common Voice dataset.
Architecture
The model utilizes the Wav2Vec 2.0 architecture, which is effective for speech recognition tasks. It processes raw audio waveforms to predict transcriptions without requiring an extensive amount of labeled data. This version is specifically fine-tuned for Spanish using the XLSR (Cross-Lingual Speech Representations) model with 53 languages.
Training
The model has been fine-tuned on the Spanish subset of the Common Voice dataset. The training process involved resampling audio data to match the model's expected input frequency and preprocessing text by removing punctuation and converting it to lowercase. The model's performance was evaluated using the Word Error Rate (WER) metric, achieving a result of 17.6%.
Guide: Running Locally
To run the model locally, follow these steps:
-
Environment Setup: Install the necessary libraries:
pip install torch torchaudio transformers datasets
-
Load the Model and Processor:
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-large-xlsr-53-spanish").to("cuda") processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-xlsr-53-spanish")
-
Dataset Preparation:
from datasets import load_dataset ds = load_dataset("common_voice", "es", split="test")
-
Audio Resampling:
import torchaudio resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)
-
Processing and Prediction: Use the pre-trained model to process and predict the speech data, then compute the Word Error Rate (WER).
For optimal performance, using a cloud GPU, such as those available on AWS, Google Cloud, or Azure, is recommended.
License
The model is released under the Apache 2.0 License, allowing for free use, modification, and distribution, provided that any copies distributed include the same license.