emotion recognition wav2vec2 I E M O C A P

speechbrain

Introduction

The Emotion Recognition with wav2vec2 on IEMOCAP repository offers tools for performing emotion recognition using a fine-tuned wav2vec2 (base) model, integrated with the SpeechBrain toolkit. The model is trained on the IEMOCAP dataset and achieves an accuracy of 78.7% on the test set.

Architecture

The system employs a wav2vec2 model that combines convolutional and residual blocks. For embedding extraction, it uses attentive statistical pooling, and the training employs Additive Margin Softmax Loss. The system facilitates speaker verification through cosine distance between speaker embeddings. It processes audio recordings sampled at 16kHz and normalizes them as needed.

Training

Training is conducted with SpeechBrain. To train the model from scratch:

  1. Clone the SpeechBrain repository:

    git clone https://github.com/speechbrain/speechbrain/
    
  2. Install dependencies:

    cd speechbrain
    pip install -r requirements.txt
    pip install -e .
    
  3. Execute training:

    cd recipes/IEMOCAP/emotion_recognition
    python train_with_wav2vec2.py hparams/train_with_wav2vec2.yaml --data_folder=your_data_folder
    

Training results, including models and logs, are available here.

Guide: Running Locally

To run the model locally:

  1. Install the development version of SpeechBrain:

    pip install git+https://github.com/speechbrain/speechbrain.git@develop
    
  2. Perform emotion recognition using the custom interface:

    from speechbrain.inference.interfaces import foreign_class
    classifier = foreign_class(source="speechbrain/emotion-recognition-wav2vec2-IEMOCAP", pymodule_file="custom_interface.py", classname="CustomEncoderWav2vec2Classifier")
    out_prob, score, index, text_lab = classifier.classify_file("speechbrain/emotion-recognition-wav2vec2-IEMOCAP/anger.wav")
    print(text_lab)
    
  3. For GPU inference, include the option run_opts={"device":"cuda"} when calling the from_hparams method.

Consider using cloud GPUs for more efficient processing, such as those offered by AWS, Google Cloud, or Azure.

License

This project is licensed under the Apache-2.0 License.

More Related APIs in Audio Classification