text2midi
amaai-labIntroduction
Text2midi is an end-to-end model designed to generate MIDI files from textual descriptions. This model utilizes pretrained large language models and an autoregressive transformer decoder to produce symbolic music based on detailed textual prompts, incorporating musical elements such as chords, tempo, and style.
Architecture
Text2midi leverages a combination of pretrained language models and a transformer-based architecture. This setup enables the generation of MIDI files that align with complex textual prompts. The model architecture includes components for encoding textual input and decoding it into symbolic music format.
Training
To train Text2midi, the use of the accelerate
library is recommended for efficient multi-GPU support. The training process involves configuring accelerate
and launching the training script with specified parameters including encoder and decoder models, dataset names, batch size, learning rate, and number of epochs.
accelerate config
accelerate launch train.py \
--encoder_model="google/flan-t5-large" \
--decoder_model="configs/transformer_decoder_config.json" \
--dataset_name="amaai-lab/MidiCaps" \
--pretrain_dataset="amaai-lab/SymphonyNet" \
--batch_size=16 \
--learning_rate=1e-4 \
--epochs=40 \
Guide: Running Locally
Basic Steps
-
Clone the Repository:
git clone https://github.com/AMAAI-Lab/text-2-midi cd text-2-midi
-
Install Dependencies:
- For CUDA-supported machines:
pip install -r requirements.txt
- For MPS-supported machines:
pip install -r requirements-mac.txt
- For CUDA-supported machines:
-
Run Inference:
Ensure you have the correct requirements installed based on your hardware. Use the following command to run inference:python model/transformer_model.py --caption <your intended descriptions>
Suggested Cloud GPUs
For optimal performance, consider utilizing cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure, which provide scalable GPU resources.
License
Text2midi is distributed under the Apache-2.0 license, allowing for widespread use and modification in both personal and commercial projects.