hubert base ls960

facebook

Introduction

HuBERT-BASE-LS960 is a speech representation model developed by Facebook, optimized for tasks such as speech recognition, generation, and compression. It is a base model pretrained on 16kHz speech audio, and it requires the speech input to be sampled at this frequency. The model does not include a tokenizer, so additional steps are necessary for speech recognition tasks.

Architecture

The HuBERT model adopts a self-supervised learning approach, addressing challenges such as multiple sound units in utterances, lack of a lexicon during pre-training, and variable sound unit lengths. It employs an offline clustering step to align target labels for a BERT-like prediction loss, applied only on masked regions. This method helps the model learn a combined acoustic and language model over continuous inputs, relying on the consistency of the unsupervised clustering step.

Training

The training of HuBERT involves a two-step clustering process, starting with a k-means clustering approach with 100 clusters. The model is fine-tuned on subsets of the Librispeech and Libri-light benchmarks, ranging from 10 minutes to 960 hours of data. The model has demonstrated significant reductions in word error rate (WER), particularly on challenging evaluation subsets.

Guide: Running Locally

To run the HuBERT model locally, follow these steps:

  1. Install Dependencies: Ensure you have Python and PyTorch installed. Use pip to install the Hugging Face Transformers library.

    pip install transformers torch
    
  2. Download the Model: Access the HuBERT model from the Hugging Face Model Hub.

    from transformers import HubertModel, Wav2Vec2Processor
    model = HubertModel.from_pretrained("facebook/hubert-base-ls960")
    processor = Wav2Vec2Processor.from_pretrained("facebook/hubert-base-ls960")
    
  3. Prepare Input Data: Ensure your audio input is sampled at 16kHz.

  4. Inference: Use the model and processor to perform tasks like feature extraction or fine-tuning for speech recognition.

Cloud GPUs

For intensive tasks or large datasets, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure to accelerate processing.

License

The HuBERT model is released under the Apache-2.0 license, allowing for wide usage and modification in compliance with the license terms.

More Related APIs in Feature Extraction