wav2vec2 finetuned pronunciation correction LLM Model

Introduction

The wav2vec2-finetuned-pronunciation-correction model is a fine-tuned version of Wav2Vec2 designed for phoneme-level pronunciation correction. It transcribes speech into phonetic notation with a Character Error Rate (CER) of 0.1.

Architecture

The model is based on facebook/wav2vec2-large-xlsr-53, a robust architecture that allows for effective speech recognition and analysis. It leverages the Wav2Vec2 model's capabilities to focus on phoneme-level corrections, making it suitable for pronunciation assessments.

Training

The fine-tuning process involved adjusting the original Wav2Vec2 model to specialize in phonetic transcription. The model was trained to minimize the Character Error Rate, achieving a CER of 0.1, indicating high accuracy in phoneme transcriptions.

Guide: Running Locally

To run this model locally, you need to have Python installed with the transformers, librosa, and torch libraries. Here are the basic steps:

Install Required Libraries:
```
pip install transformers librosa torch
```

Load the Model and Processor:

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import librosa
import torch

model = Wav2Vec2ForCTC.from_pretrained("moxeeeem/wav2vec2-finetuned-pronunciation-correction")
processor = Wav2Vec2Processor.from_pretrained("moxeeeem/wav2vec2-finetuned-pronunciation-correction")

Transcribe Audio:

def transcribe_audio(speech, sampling_rate):
    inputs = processor(speech, sampling_rate=sampling_rate, return_tensors="pt")
    with torch.no_grad():
        logits = model(inputs.input_values).logits
    pred_ids = torch.argmax(logits, dim=-1)
    return processor.batch_decode(pred_ids)[0]

speech, sample_rate = librosa.load("example_audio.wav", sr=16000)
transcription = transcribe_audio(speech, sample_rate)
print("Transcription:", transcription)

For better performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.

License

The model is licensed under the Apache 2.0 License, allowing for both personal and commercial use, as well as modification and distribution.

More Related APIs