wav2vec2 large robust

facebook

Introduction

Wav2Vec2-Large-Robust is a speech model developed by Facebook, pre-trained on 16kHz sampled speech audio. The model utilizes datasets from various domains such as Libri-Light, CommonVoice, Switchboard, and Fisher to enhance its robustness across different types of audio data. This model is particularly designed for scenarios where the domain of unlabeled pre-training data differs from that of labeled fine-tuning data.

Architecture

The model follows a self-supervised learning approach, focusing on speech representations. It is built to handle diverse data domains, which improves its performance on unseen domains during testing. The model itself does not include a tokenizer as it is pre-trained on audio alone.

Training

Wav2Vec2-Large-Robust is trained on unlabeled data from multiple speech datasets, which helps in generalizing across various domains. The training process involves pre-training on unlabeled in-domain data, significantly reducing the gap between models trained on in-domain and out-of-domain labeled data. Fine-tuning requires the creation of a tokenizer and labeled text data.

Guide: Running Locally

  1. Prerequisites:

    • Ensure your speech input is sampled at 16kHz.
    • Set up a Python environment with PyTorch and Transformers libraries installed.
  2. Basic Steps:

    • Clone the repository: git clone https://github.com/pytorch/fairseq.git
    • Navigate to the directory: cd fairseq/examples/wav2vec
    • Follow the instructions for setting up Wav2Vec2 from the provided notebook.
  3. Fine-tuning:

    • Create a tokenizer and prepare labeled text data.
    • Fine-tune the model as explained in the Hugging Face blog.
  4. Suggested Cloud GPUs:

    • AWS EC2 instances with NVIDIA GPUs.
    • Google Cloud Platform’s AI Platform with NVIDIA Tesla GPUs.
    • Azure Machine Learning with NVIDIA VMs.

License

The Wav2Vec2-Large-Robust model is licensed under the Apache-2.0 License, allowing for wide use and modification in both academic and commercial settings.

More Related APIs