vit_spectrogram
prashanth0205Introduction
The VIT_SPECTROGRAM model is a fine-tuned version of google/vit-base-patch16-224-in21k
, specifically adapted for a dataset with Mel spectrogram images classified into 'Male' and 'Female'. The model is under continued fine-tuning and testing, showing promising results such as a validation accuracy of 93.66%.
Architecture
The model is built on the Vision Transformer (ViT) architecture, leveraging the pre-trained vit-base-patch16-224-in21k
as a base. This architecture effectively handles image classification tasks by transforming images into patch sequences, similar to tokens in NLP models.
Training
The model uses the AdamWeightDecay optimizer with a learning rate schedule based on PolynomialDecay. Here are the key hyperparameters from the training setup:
- Optimizer: AdamWeightDecay with a learning rate of 3e-05.
- Decay Steps: 3032.
- End Learning Rate: 0.0.
- Training Precision: Mixed float16.
- Frameworks Used:
- Transformers 4.18.0
- TensorFlow 2.4.0
- Datasets 2.0.0
- Tokenizers 0.11.6
Guide: Running Locally
To run the VIT_SPECTROGRAM model locally, follow these steps:
- Clone the Repository: Start by cloning the repository from Hugging Face.
- Install Dependencies: Ensure that Python, TensorFlow, and the Transformers library are installed. Use:
pip install tensorflow transformers datasets
- Load the Model: Use the Transformers library to load the model.
- Run Inference: Feed the Mel spectrogram data to the model and retrieve predictions.
For optimal performance, especially during training or large-scale inference, using a cloud GPU such as those available on AWS, Google Cloud, or Azure is recommended.
License
The project is licensed under the Apache 2.0 License, allowing for both commercial and private use, modification, and distribution.