urdu audio emotions
TalhaIntroduction
The Urdu Audio Emotions model is a fine-tuned version of the Facebook wav2vec2-large-xlsr-53 model, designed for audio classification tasks. It categorizes Urdu audio into four distinct emotional states: Angry, Happy, Neutral, and Sad. The model achieves a high accuracy of 97.5% on the evaluation set.
Architecture
The model leverages the wav2vec2 architecture, which is part of the Transformers library. It is designed to work with audio data and is trained using the PyTorch framework. The model's performance can be tracked using TensorBoard.
Training
- Dataset: The training and evaluation data is sourced from the Urdu Emotion Dataset, available on Kaggle.
- Training Code: The training code is accessible on Kaggle, which provides the necessary scripts and configurations.
- Hyperparameters:
- Learning Rate: 5e-05
- Train Batch Size: 32
- Eval Batch Size: 32
- Seed: 42
- Optimizer: Adam, with betas=(0.9, 0.999) and epsilon=1e-08
- Learning Rate Scheduler: Linear
- Number of Epochs: 50
- Mixed Precision Training: Native AMP
Guide: Running Locally
-
Setup Environment:
- Install the necessary Python packages, including
transformers
,torch
,datasets
, andtokenizers
.
- Install the necessary Python packages, including
-
Download Model and Dataset:
- Clone the model repository and download the dataset from Kaggle.
-
Run Training Script:
- Ensure all dependencies are installed and run the training script available on Kaggle.
-
Inference:
- Use the model for inference by feeding it Urdu audio samples for emotion classification.
-
Recommendation:
- For optimal performance, especially during training, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
The model is released under the Apache 2.0 License, allowing for wide usage and distribution with minimal restrictions.