wav2vec2 lg xlsr en speech emotion recognition
ehcalabresIntroduction
The wav2vec2-lg-xlsr-en-speech-emotion-recognition
model is a fine-tuned version of the wav2vec2-large-xlsr-53-english
model, specifically adapted for Speech Emotion Recognition (SER) tasks. It uses the RAVDESS dataset, which contains recordings of actors expressing eight different emotions in English: angry, calm, disgust, fearful, happy, neutral, sad, and surprised. The model achieves a loss of 0.5023 and an accuracy of 82.23% on the evaluation set.
Architecture
The model builds on the wav2vec 2.0 architecture, which is designed for efficient audio processing and is commonly used in tasks involving speech recognition and classification. The model is further fine-tuned to recognize emotional states from speech inputs.
Training
The model was trained using the following hyperparameters:
- Learning Rate: 0.0001
- Train Batch Size: 4
- Eval Batch Size: 4
- Seed: 42
- Gradient Accumulation Steps: 2
- Total Train Batch Size: 8
- Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
- Learning Rate Scheduler: Linear
- Num Epochs: 3
- Mixed Precision Training: Native AMP
Training results showed a gradual improvement in loss and accuracy over the epochs, reaching an accuracy of 82.23% at the final step.
Guide: Running Locally
To run the model locally, follow these steps:
-
Environment Setup:
- Ensure you have Python installed.
- Install necessary libraries with:
pip install transformers==4.8.2 torch==1.9.0+cu102 datasets==1.9.0 tokenizers==0.10.3
-
Download Model:
- Use the Hugging Face Transformers library to load the model from the hub:
from transformers import Wav2Vec2ForSequenceClassification model = Wav2Vec2ForSequenceClassification.from_pretrained("ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition")
- Use the Hugging Face Transformers library to load the model from the hub:
-
Prepare Data:
- Organize your audio data to match the format expected by the model.
-
Inference:
- Use the model to predict emotions from your audio data.
Cloud GPUs
For improved performance, especially with large datasets, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
This project is licensed under the Apache-2.0 License, allowing for wide use and distribution with certain conditions.