wav2vec2 xls r 300m

facebook

WAV2VEC2-XLS-R-300M

Introduction

WAV2VEC2-XLS-R-300M is a large-scale multilingual pretrained speech model developed by Facebook AI. It is part of the XLS-R series designed for cross-lingual speech representation and utilizes the wav2vec 2.0 objective. The model boasts 300 million parameters and is trained on 436,000 hours of unlabeled speech data from 128 languages, including datasets like VoxPopuli, MLS, CommonVoice, BABEL, and VoxLingua107.

Architecture

XLS-R employs a wav2vec 2.0 architecture for learning the structure of speech from raw audio. The model is designed to process speech inputs sampled at 16kHz. It has been evaluated across various tasks such as speech translation, automatic speech recognition (ASR), and language identification, achieving state-of-the-art results in many benchmarks.

Training

The model was trained with up to 2 billion parameters using a diverse and extensive collection of speech data. The training process involved cross-lingual pretraining, which has shown to outperform English-only pretraining in certain tasks. The model excels in low-resource languages and has set new benchmarks for tasks like speech recognition and translation.

Guide: Running Locally

To run the WAV2VEC2-XLS-R-300M model locally, follow these steps:

  1. Set up your environment: Ensure you have Python and PyTorch installed.
  2. Install Hugging Face Transformers: Use pip to install the library.
    pip install transformers
    
  3. Load the model: Use the Transformers library to load the model and tokenizer.
    from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer
    model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-xls-r-300m")
    tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-xls-r-300m")
    
  4. Prepare your audio input: Ensure your audio data is sampled at 16kHz.
  5. Run inference: Tokenize your audio input and perform inference using the model.

For optimal performance, consider using cloud GPUs such as those available on AWS, Google Cloud, or Azure.

License

The WAV2VEC2-XLS-R-300M model is licensed under the Apache 2.0 License, allowing for both personal and commercial use.

More Related APIs