hubert base ls960
facebookIntroduction
HuBERT-BASE-LS960 is a speech representation model developed by Facebook, optimized for tasks such as speech recognition, generation, and compression. It is a base model pretrained on 16kHz speech audio, and it requires the speech input to be sampled at this frequency. The model does not include a tokenizer, so additional steps are necessary for speech recognition tasks.
Architecture
The HuBERT model adopts a self-supervised learning approach, addressing challenges such as multiple sound units in utterances, lack of a lexicon during pre-training, and variable sound unit lengths. It employs an offline clustering step to align target labels for a BERT-like prediction loss, applied only on masked regions. This method helps the model learn a combined acoustic and language model over continuous inputs, relying on the consistency of the unsupervised clustering step.
Training
The training of HuBERT involves a two-step clustering process, starting with a k-means clustering approach with 100 clusters. The model is fine-tuned on subsets of the Librispeech and Libri-light benchmarks, ranging from 10 minutes to 960 hours of data. The model has demonstrated significant reductions in word error rate (WER), particularly on challenging evaluation subsets.
Guide: Running Locally
To run the HuBERT model locally, follow these steps:
-
Install Dependencies: Ensure you have Python and PyTorch installed. Use
pip
to install the Hugging Face Transformers library.pip install transformers torch
-
Download the Model: Access the HuBERT model from the Hugging Face Model Hub.
from transformers import HubertModel, Wav2Vec2Processor model = HubertModel.from_pretrained("facebook/hubert-base-ls960") processor = Wav2Vec2Processor.from_pretrained("facebook/hubert-base-ls960")
-
Prepare Input Data: Ensure your audio input is sampled at 16kHz.
-
Inference: Use the model and processor to perform tasks like feature extraction or fine-tuning for speech recognition.
Cloud GPUs
For intensive tasks or large datasets, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure to accelerate processing.
License
The HuBERT model is released under the Apache-2.0 license, allowing for wide usage and modification in compliance with the license terms.