pop2piano
sweetcocoaIntroduction
Pop2Piano is a Transformer network designed to generate piano covers from the audio waveforms of pop music. It allows users to create piano renditions directly from a song's audio without the need for melody and chord extraction modules.
Architecture
Pop2Piano is an encoder-decoder Transformer model based on T5. The input audio is converted to its waveform and processed by the encoder to produce a latent representation. The decoder generates token IDs in an autoregressive manner, with each token representing time, velocity, note, or a special type. These tokens are then decoded into a MIDI file.
Training
Pop2Piano primarily uses Korean Pop music for training but is also effective with Western Pop and Hip Hop songs. The model includes features to vary results by using different composer settings during the generation process.
Guide: Running Locally
-
Installation:
Install necessary libraries:pip install git+https://github.com/huggingface/transformers.git pip install pretty-midi==0.2.9 essentia==2.1b6.dev1034 librosa scipy
-
Using Your Own Audio:
Load and process your own audio file:import librosa from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor audio, sr = librosa.load("<your_audio_file_here>", sr=44100) model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano") processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano") inputs = processor(audio=audio, sampling_rate=sr, return_tensors="pt") model_output = model.generate(input_features=inputs["input_features"], composer="composer1") tokenizer_output = processor.batch_decode( token_ids=model_output, feature_extractor_output=inputs )["pretty_midi_objects"][0] tokenizer_output.write("./Outputs/midi_output.mid")
-
Using Audio from Hugging Face Hub:
Process audio from a dataset:from datasets import load_dataset from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano") processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano") ds = load_dataset("sweetcocoa/pop2piano_ci", split="test") inputs = processor( audio=ds["audio"][0]["array"], sampling_rate=ds["audio"][0]["sampling_rate"], return_tensors="pt" ) model_output = model.generate(input_features=inputs["input_features"], composer="composer1") tokenizer_output = processor.batch_decode( token_ids=model_output, feature_extractor_output=inputs )["pretty_midi_objects"][0] tokenizer_output.write("./Outputs/midi_output.mid")
-
Cloud GPU Recommendation:
For faster processing, consider using cloud services with GPU support, such as AWS, Google Cloud, or Azure.
License
The Pop2Piano model is available under a license that can be viewed in the original repository. For more details, refer to the repository.