wav2vec2 large robust 12 ft emotion msp dim
audeeringIntroduction
The wav2vec2-large-robust-12-ft-emotion-msp-dim
model by audEERING is a fine-tuned version of Wav2Vec2 for dimensional speech emotion recognition. It predicts emotional dimensions such as arousal, dominance, and valence from raw audio signals. This model is intended for research purposes, with commercial licenses available through audEERING for models trained on larger datasets.
Architecture
This model is based on the Wav2Vec2 architecture, specifically fine-tuned from the Wav2Vec2-Large-Robust variant. It has been pruned to 12 transformer layers from the original 24 before fine-tuning. The model processes raw audio inputs and outputs predictions for emotional dimensions as well as the pooled states of the last transformer layer. An ONNX export is available for deployment purposes.
Training
The model was trained using the MSP-Podcast dataset (version 1.7) and fine-tuned to predict emotional dimensions from audio. The training involved reducing the number of transformer layers and adapting the model for audio classification tasks.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Dependencies: Ensure you have Python and PyTorch installed. Install the
transformers
library from Hugging Face.pip install transformers torch
-
Load the Model: Use the
Wav2Vec2Processor
andEmotionModel
classes to load the model and processor.from transformers import Wav2Vec2Processor from your_custom_module import EmotionModel # Ensure this class is defined as shown in the model usage model_name = 'audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim' processor = Wav2Vec2Processor.from_pretrained(model_name) model = EmotionModel.from_pretrained(model_name).to('cpu')
-
Prepare Audio Input: Prepare your audio signal as a NumPy array with the correct sampling rate (16,000 Hz).
-
Process the Audio: Use the provided
process_func
to predict emotions or extract embeddings. -
Cloud GPUs: For faster processing, consider using cloud GPU services such as AWS, Google Cloud, or Azure to handle the computations.
License
This model is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). It can be used for non-commercial research with appropriate attribution. Commercial use requires a separate license from audEERING.