wav2vec2 xlsr 1b finnish lm v2
Finnish-NLPIntroduction
The WAV2VEC2-XLSR-1B-FINNISH-LM-V2 is a fine-tuned version of Facebook's Wav2Vec2 XLS-R model for Finnish Automatic Speech Recognition (ASR). This model is based on a large-scale multilingual pretrained model and is fine-tuned with 275.6 hours of Finnish transcribed speech data.
Architecture
The Wav2Vec2 XLS-R model, developed by Facebook AI, is a multilingual pretrained model for speech processing. It utilizes self-supervised learning with a wav2vec 2.0 objective across 128 languages. The architecture is designed with 1 billion parameters, focusing on recognizing and transcribing Finnish speech.
Training
The model was fine-tuned using various datasets, predominantly the Aalto Finnish Parliament ASR Corpus. Training was conducted during the Robust Speech Challenge Event, using a Tesla V100 GPU. The model achieved a Word Error Rate (WER) of 4.09 and a Character Error Rate (CER) of 0.88 on the Common Voice 7.0 Finnish test split. Key training hyperparameters included a learning rate of 5e-05, a train batch size of 32, and the use of 8-bit Adam optimizer.
Guide: Running Locally
- Setup Environment: Ensure you have Python installed, along with necessary packages like
transformers
,datasets
, andpytorch
. - Clone repository: Clone the model repository to your local machine.
- Install Dependencies: Use pip to install required dependencies from the
requirements.txt
. - Inference: Use the provided
eval.py
script to run evaluations using the command:python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
- GPU Recommendation: For efficient processing, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
The model is released under the Apache-2.0 License, which allows for both personal and commercial use, distribution, modification, and more, with proper attribution.