Introduction

This project utilizes deep learning techniques to develop an 8-class piano timbre discriminator model. It distinguishes between various piano brands and types such as Kawai, Steinway, and Yamaha by processing audio data into Mel spectrograms. The model employs supervised learning during fine-tuning to accurately classify piano timbres. It uses a large annotated audio dataset to enhance performance and has potential applications in music assessment and audio engineering.

Architecture

The model draws from classical backbone network structures in computer vision to build its audio classification capabilities. By converting audio inputs into Mel spectrograms, it allows the deep learning framework to extract significant features for classifying piano sounds. The model leverages SqueezeNet for fine-tuning, as demonstrated in its training results.

Training

Training involves a large-scale annotated dataset, which aids in refining the model's feature extraction abilities. The training process is supported by a loss curve, accuracy metrics, and a confusion matrix, which help in evaluating the model's performance. Supervised learning is pivotal during the fine-tuning phase, enabling the model to achieve high accuracy in practical scenarios.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Clone the Repository:

    GIT_LFS_SKIP_SMUDGE=1 git clone git@hf.co:ccmusic-database/pianos
    cd pianos
    
  2. Download the Model:

    from modelscope import snapshot_download
    model_dir = snapshot_download('ccmusic-database/pianos')
    
  3. Hardware Recommendations:

    • It is suggested to use a cloud GPU service, such as AWS, Google Cloud, or Azure, to handle intensive computations efficiently.

License

The project is licensed under the MIT License, allowing for flexibility in usage and distribution.

More Related APIs in Audio Classification