wav2vec2 xlsr 1b finnish lm v2

Finnish-NLP

Introduction

The WAV2VEC2-XLSR-1B-FINNISH-LM-V2 is a fine-tuned version of Facebook's Wav2Vec2 XLS-R model for Finnish Automatic Speech Recognition (ASR). This model is based on a large-scale multilingual pretrained model and is fine-tuned with 275.6 hours of Finnish transcribed speech data.

Architecture

The Wav2Vec2 XLS-R model, developed by Facebook AI, is a multilingual pretrained model for speech processing. It utilizes self-supervised learning with a wav2vec 2.0 objective across 128 languages. The architecture is designed with 1 billion parameters, focusing on recognizing and transcribing Finnish speech.

Training

The model was fine-tuned using various datasets, predominantly the Aalto Finnish Parliament ASR Corpus. Training was conducted during the Robust Speech Challenge Event, using a Tesla V100 GPU. The model achieved a Word Error Rate (WER) of 4.09 and a Character Error Rate (CER) of 0.88 on the Common Voice 7.0 Finnish test split. Key training hyperparameters included a learning rate of 5e-05, a train batch size of 32, and the use of 8-bit Adam optimizer.

Guide: Running Locally

  1. Setup Environment: Ensure you have Python installed, along with necessary packages like transformers, datasets, and pytorch.
  2. Clone repository: Clone the model repository to your local machine.
  3. Install Dependencies: Use pip to install required dependencies from the requirements.txt.
  4. Inference: Use the provided eval.py script to run evaluations using the command:
    python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
    
  5. GPU Recommendation: For efficient processing, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

The model is released under the Apache-2.0 License, which allows for both personal and commercial use, distribution, modification, and more, with proper attribution.

More Related APIs in Automatic Speech Recognition