wav2vec2 large xlsr open brazilian portuguese v2

lgris

Introduction

The wav2vec2-large-xlsr-open-brazilian-portuguese-v2 model is a fine-tuned Wav2Vec model for automatic speech recognition (ASR) in Brazilian Portuguese. It leverages several datasets to train the model, focusing on creating an open-source solution for ASR in this language.

Architecture

The model is based on the Wav2Vec 2.0 architecture, which uses a deep learning approach to process raw audio signals for speech recognition tasks. The model utilizes the transformers library and is implemented in PyTorch.

Training

The model was fine-tuned on a combination of datasets to build a comprehensive Brazilian Portuguese dataset:

  • CETUC: Contains 145 hours of speech from 100 speakers.
  • Multilingual Librispeech (MLS): Provides 284 hours of transcribed Brazilian Portuguese from audiobooks.
  • VoxForge: Includes 4,130 utterances from 100 speakers.
  • Common Voice 6.1: Offers 50 validated hours from 1,120 speakers.
  • Lapsbm: Comprises 700 utterances from 35 speakers.

The model was trained using the fairseq library and converts to use with the transformers library for easier deployment.

Guide: Running Locally

  1. Install Dependencies:

    pip install datasets jiwer torchaudio transformers soundfile
    
  2. Set Up the Model:

    import torch
    from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
    
    device = "cuda"
    model_name = 'lgris/wav2vec2-large-xlsr-open-brazilian-portuguese-v2'
    model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device)
    processor = Wav2Vec2Processor.from_pretrained(model_name)
    
  3. Prepare and Test:

    • Load a dataset and preprocess the data.
    • Run predictions and compute the Word Error Rate (WER) using test datasets like Common Voice and TEDx.
  4. Suggested Cloud GPUs: Consider using cloud platforms like AWS, Google Cloud, or Azure for access to powerful GPUs, which can significantly speed up the processing time.

License

This model is released under the Apache 2.0 License, allowing for free use, distribution, and modification with appropriate attribution.

More Related APIs in Automatic Speech Recognition