distil wav2vec2 adult child cls 37m

bookbot

Introduction

DistilWav2Vec2 Adult/Child Speech Classifier is an audio classification model based on the wav2vec 2.0 architecture. It is a distilled version of the wav2vec2-adult-child-cls model, designed to classify adult and child speech using a private dataset. The model achieves high accuracy and efficiency, making it suitable for classification tasks in this domain.

Architecture

The model uses the wav2vec 2.0 architecture, featuring 37 million parameters. It is specifically tailored for the Adult/Child Speech Classification Dataset. This architecture allows the model to effectively distinguish between adult and child speech patterns.

Training

The model was trained using the PyTorch framework, leveraging a Tesla P100 GPU provided by Kaggle. Training details include:

  • Learning rate: 3e-05
  • Batch sizes: train and eval batch size of 32
  • Seed: 42
  • Gradient accumulation steps: 4
  • Total train batch size: 128
  • Optimizer: Adam with betas (0.9, 0.999) and epsilon 1e-08
  • Learning rate scheduler: linear with a warmup ratio of 0.1
  • Epochs: 5

The model achieved a training loss of 0.1179 and a validation loss of 0.1431 by the end of training, with an accuracy of 95.89% and an F1 score of 0.9624.

Guide: Running Locally

To run this model locally, follow these steps:

  1. Install Dependencies: Ensure you have Python installed. Install PyTorch, Transformers, Datasets, and Tokenizers using pip:

    pip install torch transformers datasets tokenizers
    
  2. Load Model: Use the Hugging Face Transformers library to load the model:

    from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2Processor
    model = Wav2Vec2ForSequenceClassification.from_pretrained("bookbot/distil-wav2vec2-adult-child-cls-37m")
    processor = Wav2Vec2Processor.from_pretrained("bookbot/distil-wav2vec2-adult-child-cls-37m")
    
  3. Inference: Use the processor and model to classify audio files:

    inputs = processor(audio, sampling_rate=16000, return_tensors="pt", padding=True)
    logits = model(**inputs).logits
    predicted_ids = logits.argmax(axis=-1)
    
  4. Cloud GPU Recommendations: For improved performance, consider using cloud GPUs such as AWS EC2 instances with Tesla V100 or A100, or Google Cloud Platform's AI Platform.

License

The model is licensed under the Apache-2.0 License, allowing for both personal and commercial use, modification, and distribution of the model and its derivatives.

More Related APIs in Audio Classification