wav2vec2 xls r 300m phoneme
vitouphyWAV2VEC2-XLS-R-300M-PHONEME
Introduction
WAV2VEC2-XLS-R-300M-PHONEME is a fine-tuned version of the facebook/wav2vec2-xls-r-300m
model, designed for automatic speech recognition tasks. The model demonstrates a loss of 0.3327 and a character error rate (CER) of 0.1332 on the evaluation set.
Architecture
The model utilizes the wav2vec2 architecture, which is built on top of the Transformers library and is compatible with PyTorch. It supports safe tensor formats and is compatible with inference endpoints.
Training
Training Procedure
The model was fine-tuned using the following hyperparameters:
- Learning Rate: 3e-05
- Train Batch Size: 8
- Evaluation Batch Size: 8
- Gradient Accumulation Steps: 4
- Total Train Batch Size: 32
- Optimizer: Adam with betas (0.9, 0.999) and epsilon 1e-08
- Learning Rate Scheduler Type: Linear
- Warmup Steps: 2000
- Training Steps: 7000
- Mixed Precision Training: Native AMP
Training Results
Training progress is documented as follows:
Epoch | Step | Training Loss | Validation Loss | CER |
---|---|---|---|---|
1.32 | 1000 | 3.4324 | 3.3693 | 0.9091 |
2.65 | 2000 | 2.1751 | 1.1382 | 0.2397 |
3.97 | 3000 | 1.3986 | 0.4886 | 0.1452 |
5.3 | 4000 | 1.2285 | 0.3842 | 0.1351 |
6.62 | 5000 | 1.142 | 0.3505 | 0.1349 |
7.95 | 6000 | 1.1075 | 0.3323 | 0.1317 |
9.27 | 7000 | 1.0867 | 0.3265 | 0.1315 |
Framework Versions
- Transformers: 4.17.0.dev0
- PyTorch: 1.10.2+cu102
- Datasets: 1.18.2.dev0
- Tokenizers: 0.11.0
Guide: Running Locally
To run this model locally, follow these steps:
-
Install Dependencies:
Ensure Python and PyTorch are installed. Usepip
to install Transformers and other required libraries.pip install torch transformers datasets
-
Load the Model:
Use the Transformers library to load the model.from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor model = Wav2Vec2ForCTC.from_pretrained("vitouphy/wav2vec2-xls-r-300m-phoneme") processor = Wav2Vec2Processor.from_pretrained("vitouphy/wav2vec2-xls-r-300m-phoneme")
-
Inference:
Prepare audio data, process it, and run inference.# Add code to preprocess audio and run inference
Suggestion: Cloud GPUs
For optimal performance, consider using cloud services such as AWS, Google Cloud Platform, or Azure for GPU support.
License
The model is licensed under the Apache 2.0 License.