wav2vec2 large xlsr bahasa indonesia

Bagus

Introduction
The wav2vec2-large-xlsr-bahasa-indonesia model is a large automatic speech recognition model designed for the Indonesian language. It leverages the Hugging Face Transformers library and is implemented in PyTorch.

Architecture
This model is based on the wav2vec2 architecture, which is optimized for processing audio and speech data. It is trained on the Common Voice dataset version 6.1, specifically in the Indonesian language.

Training
The model was trained using the Common Voice Indonesian dataset (version 6.1) and achieves a word error rate (WER) of 19.3%. For improved performance, a newer version of the model with a smaller architecture and a reduced WER of 5.9% is available here.

Guide: Running Locally
To run the model locally, follow these steps:

  1. Clone the repository from GitHub: wav2vec2-indonesian.
  2. Install the required dependencies using pip:
    pip install torch transformers datasets
    
  3. Load the model and tokenizer using the Transformers library in Python:
    from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer
    model = Wav2Vec2ForCTC.from_pretrained("Bagus/wav2vec2-large-xlsr-bahasa-indonesia")
    tokenizer = Wav2Vec2Tokenizer.from_pretrained("Bagus/wav2vec2-large-xlsr-bahasa-indonesia")
    
  4. Optionally, use a cloud GPU service for faster inference and training, such as AWS, Google Cloud, or Azure.

License
This model is licensed under the Apache 2.0 license.

More Related APIs in Automatic Speech Recognition