wav2vec2 large xlsr 53 french
jonatasgrosmanIntroduction
The wav2vec2-large-xlsr-53-french
model is a fine-tuned version of Facebook's Wav2Vec2, specifically tailored for Automatic Speech Recognition (ASR) in French. It's trained on the Common Voice 6.1 dataset and is part of the Hugging Face model repository, developed by Jonatas Grosman.
Architecture
This model belongs to the Wav2Vec2 architecture, which is designed for speech recognition tasks. It is based on a large transformer model pre-trained on a diverse multilingual dataset (XLSR-53) and then fine-tuned on French data to improve specificity and accuracy for the French language.
Training
The model was fine-tuned on the Common Voice dataset, focusing on the French language. The training process involved adjusting the pre-trained Wav2Vec2 model to better understand and transcribe French audio inputs sampled at 16kHz. The model's performance is evaluated using metrics such as Word Error Rate (WER) and Character Error Rate (CER), with additional tests conducted using language models (LM).
Guide: Running Locally
-
Setup Environment:
- Ensure Python is installed.
- Install required libraries:
torch
,librosa
,transformers
, anddatasets
.
-
Load the Model:
- Use the HuggingSound library:
from huggingsound import SpeechRecognitionModel model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-french")
- Load audio files for transcription.
- Use the HuggingSound library:
-
Inference Script:
- Utilize PyTorch and Transformers for transcription:
import torch from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor processor = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-french") model = Wav2Vec2ForCTC.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-french")
- Utilize PyTorch and Transformers for transcription:
-
Cloud GPUs:
- For enhanced performance, especially with large datasets, consider using cloud-based GPUs such as those offered by OVHcloud.
License
The model is released under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.