emotion recognition wav2vec2 I E M O C A P
speechbrainIntroduction
The Emotion Recognition with wav2vec2 on IEMOCAP repository offers tools for performing emotion recognition using a fine-tuned wav2vec2 (base) model, integrated with the SpeechBrain toolkit. The model is trained on the IEMOCAP dataset and achieves an accuracy of 78.7% on the test set.
Architecture
The system employs a wav2vec2 model that combines convolutional and residual blocks. For embedding extraction, it uses attentive statistical pooling, and the training employs Additive Margin Softmax Loss. The system facilitates speaker verification through cosine distance between speaker embeddings. It processes audio recordings sampled at 16kHz and normalizes them as needed.
Training
Training is conducted with SpeechBrain. To train the model from scratch:
-
Clone the SpeechBrain repository:
git clone https://github.com/speechbrain/speechbrain/
-
Install dependencies:
cd speechbrain pip install -r requirements.txt pip install -e .
-
Execute training:
cd recipes/IEMOCAP/emotion_recognition python train_with_wav2vec2.py hparams/train_with_wav2vec2.yaml --data_folder=your_data_folder
Training results, including models and logs, are available here.
Guide: Running Locally
To run the model locally:
-
Install the development version of SpeechBrain:
pip install git+https://github.com/speechbrain/speechbrain.git@develop
-
Perform emotion recognition using the custom interface:
from speechbrain.inference.interfaces import foreign_class classifier = foreign_class(source="speechbrain/emotion-recognition-wav2vec2-IEMOCAP", pymodule_file="custom_interface.py", classname="CustomEncoderWav2vec2Classifier") out_prob, score, index, text_lab = classifier.classify_file("speechbrain/emotion-recognition-wav2vec2-IEMOCAP/anger.wav") print(text_lab)
-
For GPU inference, include the option
run_opts={"device":"cuda"}
when calling thefrom_hparams
method.
Consider using cloud GPUs for more efficient processing, such as those offered by AWS, Google Cloud, or Azure.
License
This project is licensed under the Apache-2.0 License.