wav2vec2 large xlsr 53 polish
jonatasgrosmanIntroduction
The wav2vec2-large-xlsr-53-polish
model is a fine-tuned version of the facebook/wav2vec2-large-xlsr-53
model for Automatic Speech Recognition (ASR) in Polish. It was trained on the Common Voice dataset, specifically the Polish language subset, to enhance speech recognition capabilities.
Architecture
This model leverages the Wav2Vec2 architecture, known for its ability to perform speech recognition tasks efficiently. It is built on top of the PyTorch library and utilizes the transformers
framework. The model is designed to process audio inputs sampled at 16kHz.
Training
The model was fine-tuned on the Polish dataset from Common Voice 6.1. Training involved using GPU credits provided by OVHcloud. The training script is available on GitHub, allowing for reproducibility and further experimentation.
Guide: Running Locally
Basic Steps
-
Installation: Ensure you have Python and the necessary libraries installed. You can install the
transformers
,datasets
, andlibrosa
libraries using pip:pip install transformers datasets librosa
-
Download and Load Model:
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-polish" processor = Wav2Vec2Processor.from_pretrained(MODEL_ID) model = Wav2Vec2ForCTC.from_pretrained(MODEL_ID)
-
Preprocess Audio Data: Use
librosa
to load your audio files and preprocess them.import librosa def speech_file_to_array_fn(path): speech_array, sampling_rate = librosa.load(path, sr=16_000) return speech_array
-
Inference: Transcribe audio files.
import torch audio_paths = ["/path/to/file.mp3", "/path/to/another_file.wav"] inputs = processor(audio_paths, sampling_rate=16_000, return_tensors="pt", padding=True) with torch.no_grad(): logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits predicted_ids = torch.argmax(logits, dim=-1) transcriptions = processor.batch_decode(predicted_ids)
Cloud GPUs
For enhanced performance, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or OVHcloud, which offer dedicated resources for intensive computational tasks like model training and inference.
License
This model is released under the Apache 2.0 License, allowing for both personal and commercial use, modification, and distribution. Ensure compliance with the license terms when using the model.