distil wav2vec2 adult child cls 37m
bookbotIntroduction
DistilWav2Vec2 Adult/Child Speech Classifier is an audio classification model based on the wav2vec 2.0 architecture. It is a distilled version of the wav2vec2-adult-child-cls model, designed to classify adult and child speech using a private dataset. The model achieves high accuracy and efficiency, making it suitable for classification tasks in this domain.
Architecture
The model uses the wav2vec 2.0 architecture, featuring 37 million parameters. It is specifically tailored for the Adult/Child Speech Classification Dataset. This architecture allows the model to effectively distinguish between adult and child speech patterns.
Training
The model was trained using the PyTorch framework, leveraging a Tesla P100 GPU provided by Kaggle. Training details include:
- Learning rate: 3e-05
- Batch sizes: train and eval batch size of 32
- Seed: 42
- Gradient accumulation steps: 4
- Total train batch size: 128
- Optimizer: Adam with betas (0.9, 0.999) and epsilon 1e-08
- Learning rate scheduler: linear with a warmup ratio of 0.1
- Epochs: 5
The model achieved a training loss of 0.1179 and a validation loss of 0.1431 by the end of training, with an accuracy of 95.89% and an F1 score of 0.9624.
Guide: Running Locally
To run this model locally, follow these steps:
-
Install Dependencies: Ensure you have Python installed. Install PyTorch, Transformers, Datasets, and Tokenizers using pip:
pip install torch transformers datasets tokenizers
-
Load Model: Use the Hugging Face Transformers library to load the model:
from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2Processor model = Wav2Vec2ForSequenceClassification.from_pretrained("bookbot/distil-wav2vec2-adult-child-cls-37m") processor = Wav2Vec2Processor.from_pretrained("bookbot/distil-wav2vec2-adult-child-cls-37m")
-
Inference: Use the processor and model to classify audio files:
inputs = processor(audio, sampling_rate=16000, return_tensors="pt", padding=True) logits = model(**inputs).logits predicted_ids = logits.argmax(axis=-1)
-
Cloud GPU Recommendations: For improved performance, consider using cloud GPUs such as AWS EC2 instances with Tesla V100 or A100, or Google Cloud Platform's AI Platform.
License
The model is licensed under the Apache-2.0 License, allowing for both personal and commercial use, modification, and distribution of the model and its derivatives.