wav2vec2 large xlsr open brazilian portuguese v2
lgrisIntroduction
The wav2vec2-large-xlsr-open-brazilian-portuguese-v2
model is a fine-tuned Wav2Vec model for automatic speech recognition (ASR) in Brazilian Portuguese. It leverages several datasets to train the model, focusing on creating an open-source solution for ASR in this language.
Architecture
The model is based on the Wav2Vec 2.0 architecture, which uses a deep learning approach to process raw audio signals for speech recognition tasks. The model utilizes the transformers
library and is implemented in PyTorch.
Training
The model was fine-tuned on a combination of datasets to build a comprehensive Brazilian Portuguese dataset:
- CETUC: Contains 145 hours of speech from 100 speakers.
- Multilingual Librispeech (MLS): Provides 284 hours of transcribed Brazilian Portuguese from audiobooks.
- VoxForge: Includes 4,130 utterances from 100 speakers.
- Common Voice 6.1: Offers 50 validated hours from 1,120 speakers.
- Lapsbm: Comprises 700 utterances from 35 speakers.
The model was trained using the fairseq
library and converts to use with the transformers
library for easier deployment.
Guide: Running Locally
-
Install Dependencies:
pip install datasets jiwer torchaudio transformers soundfile
-
Set Up the Model:
import torch from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor device = "cuda" model_name = 'lgris/wav2vec2-large-xlsr-open-brazilian-portuguese-v2' model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device) processor = Wav2Vec2Processor.from_pretrained(model_name)
-
Prepare and Test:
- Load a dataset and preprocess the data.
- Run predictions and compute the Word Error Rate (WER) using test datasets like Common Voice and TEDx.
-
Suggested Cloud GPUs: Consider using cloud platforms like AWS, Google Cloud, or Azure for access to powerful GPUs, which can significantly speed up the processing time.
License
This model is released under the Apache 2.0 License, allowing for free use, distribution, and modification with appropriate attribution.