wav2vec2 large xlsr 53 french LLM Model

Introduction

The wav2vec2-large-xlsr-53-french model is a fine-tuned version of Facebook's Wav2Vec2, specifically tailored for Automatic Speech Recognition (ASR) in French. It's trained on the Common Voice 6.1 dataset and is part of the Hugging Face model repository, developed by Jonatas Grosman.

Architecture

This model belongs to the Wav2Vec2 architecture, which is designed for speech recognition tasks. It is based on a large transformer model pre-trained on a diverse multilingual dataset (XLSR-53) and then fine-tuned on French data to improve specificity and accuracy for the French language.

Training

The model was fine-tuned on the Common Voice dataset, focusing on the French language. The training process involved adjusting the pre-trained Wav2Vec2 model to better understand and transcribe French audio inputs sampled at 16kHz. The model's performance is evaluated using metrics such as Word Error Rate (WER) and Character Error Rate (CER), with additional tests conducted using language models (LM).

Guide: Running Locally

Setup Environment:
- Ensure Python is installed.
- Install required libraries: torch, librosa, transformers, and datasets.

Load the Model:

Use the HuggingSound library:

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-french")

Load audio files for transcription.

Inference Script:

Utilize PyTorch and Transformers for transcription:

import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

processor = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-french")
model = Wav2Vec2ForCTC.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-french")

Cloud GPUs:
- For enhanced performance, especially with large datasets, consider using cloud-based GPUs such as those offered by OVHcloud.

License

The model is released under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.

More Related APIs in Automatic Speech Recognition