wav2vec2 finetuned pronunciation correction
moxeeeemIntroduction
The wav2vec2-finetuned-pronunciation-correction
model is a fine-tuned version of Wav2Vec2 designed for phoneme-level pronunciation correction. It transcribes speech into phonetic notation with a Character Error Rate (CER) of 0.1.
Architecture
The model is based on facebook/wav2vec2-large-xlsr-53
, a robust architecture that allows for effective speech recognition and analysis. It leverages the Wav2Vec2 model's capabilities to focus on phoneme-level corrections, making it suitable for pronunciation assessments.
Training
The fine-tuning process involved adjusting the original Wav2Vec2 model to specialize in phonetic transcription. The model was trained to minimize the Character Error Rate, achieving a CER of 0.1, indicating high accuracy in phoneme transcriptions.
Guide: Running Locally
To run this model locally, you need to have Python installed with the transformers
, librosa
, and torch
libraries. Here are the basic steps:
-
Install Required Libraries:
pip install transformers librosa torch
-
Load the Model and Processor:
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor import librosa import torch model = Wav2Vec2ForCTC.from_pretrained("moxeeeem/wav2vec2-finetuned-pronunciation-correction") processor = Wav2Vec2Processor.from_pretrained("moxeeeem/wav2vec2-finetuned-pronunciation-correction")
-
Transcribe Audio:
def transcribe_audio(speech, sampling_rate): inputs = processor(speech, sampling_rate=sampling_rate, return_tensors="pt") with torch.no_grad(): logits = model(inputs.input_values).logits pred_ids = torch.argmax(logits, dim=-1) return processor.batch_decode(pred_ids)[0] speech, sample_rate = librosa.load("example_audio.wav", sr=16000) transcription = transcribe_audio(speech, sample_rate) print("Transcription:", transcription)
For better performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.
License
The model is licensed under the Apache 2.0 License, allowing for both personal and commercial use, as well as modification and distribution.