wav2vec2 large robust ft libri 960h
facebookIntroduction
The wav2vec2-large-robust-ft-libri-960h
model is an automatic speech recognition (ASR) model fine-tuned by Facebook on the Librispeech dataset. It is designed to transcribe audio inputs into text and is based on the Wav2Vec2 architecture.
Architecture
This model is a fine-tuned version of Wav2Vec2, initially pre-trained on diverse datasets, including Libri-Light, CommonVoice, Switchboard, and Fisher. It has been refined using 960 hours of Librispeech data. Wav2Vec2 uses a self-supervised learning approach to understand speech structures from raw audio.
Training
The model was pre-trained on a variety of audio datasets to enhance its robustness across different audio domains. The pre-training involved unlabeled audio data, and fine-tuning was performed on labeled data from the Librispeech dataset. This approach allows the model to generalize well across various domains and improve its performance on the target domain.
Guide: Running Locally
-
Install Dependencies: Ensure you have Python and PyTorch installed. Install the
transformers
anddatasets
libraries via pip:pip install transformers datasets soundfile torch
-
Load Model and Processor:
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-robust-ft-libri-960h") model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-large-robust-ft-libri-960h")
-
Prepare Data: Load your audio files and convert them to arrays using a library like
soundfile
. -
Tokenize and Infer: Convert audio data into tensors and pass them through the model to obtain transcriptions.
import torch input_values = processor(your_audio_data, return_tensors="pt", padding="longest").input_values logits = model(input_values).logits predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.batch_decode(predicted_ids)
-
Suggest Cloud GPUs: Consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure for faster inference, especially when processing large datasets.
License
This model is licensed under the Apache-2.0 License, allowing for wide use and distribution in both private and commercial applications.