pianos
ccmusic-databaseIntroduction
This project utilizes deep learning techniques to develop an 8-class piano timbre discriminator model. It distinguishes between various piano brands and types such as Kawai, Steinway, and Yamaha by processing audio data into Mel spectrograms. The model employs supervised learning during fine-tuning to accurately classify piano timbres. It uses a large annotated audio dataset to enhance performance and has potential applications in music assessment and audio engineering.
Architecture
The model draws from classical backbone network structures in computer vision to build its audio classification capabilities. By converting audio inputs into Mel spectrograms, it allows the deep learning framework to extract significant features for classifying piano sounds. The model leverages SqueezeNet for fine-tuning, as demonstrated in its training results.
Training
Training involves a large-scale annotated dataset, which aids in refining the model's feature extraction abilities. The training process is supported by a loss curve, accuracy metrics, and a confusion matrix, which help in evaluating the model's performance. Supervised learning is pivotal during the fine-tuning phase, enabling the model to achieve high accuracy in practical scenarios.
Guide: Running Locally
To run the model locally, follow these steps:
-
Clone the Repository:
GIT_LFS_SKIP_SMUDGE=1 git clone git@hf.co:ccmusic-database/pianos cd pianos
-
Download the Model:
from modelscope import snapshot_download model_dir = snapshot_download('ccmusic-database/pianos')
-
Hardware Recommendations:
- It is suggested to use a cloud GPU service, such as AWS, Google Cloud, or Azure, to handle intensive computations efficiently.
License
The project is licensed under the MIT License, allowing for flexibility in usage and distribution.