urdu audio emotions

Talha

Introduction

The Urdu Audio Emotions model is a fine-tuned version of the Facebook wav2vec2-large-xlsr-53 model, designed for audio classification tasks. It categorizes Urdu audio into four distinct emotional states: Angry, Happy, Neutral, and Sad. The model achieves a high accuracy of 97.5% on the evaluation set.

Architecture

The model leverages the wav2vec2 architecture, which is part of the Transformers library. It is designed to work with audio data and is trained using the PyTorch framework. The model's performance can be tracked using TensorBoard.

Training

  • Dataset: The training and evaluation data is sourced from the Urdu Emotion Dataset, available on Kaggle.
  • Training Code: The training code is accessible on Kaggle, which provides the necessary scripts and configurations.
  • Hyperparameters:
    • Learning Rate: 5e-05
    • Train Batch Size: 32
    • Eval Batch Size: 32
    • Seed: 42
    • Optimizer: Adam, with betas=(0.9, 0.999) and epsilon=1e-08
    • Learning Rate Scheduler: Linear
    • Number of Epochs: 50
    • Mixed Precision Training: Native AMP

Guide: Running Locally

  1. Setup Environment:

    • Install the necessary Python packages, including transformers, torch, datasets, and tokenizers.
  2. Download Model and Dataset:

    • Clone the model repository and download the dataset from Kaggle.
  3. Run Training Script:

    • Ensure all dependencies are installed and run the training script available on Kaggle.
  4. Inference:

    • Use the model for inference by feeding it Urdu audio samples for emotion classification.
  5. Recommendation:

    • For optimal performance, especially during training, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

The model is released under the Apache 2.0 License, allowing for wide usage and distribution with minimal restrictions.

More Related APIs in Audio Classification