Introduction

Text2midi is an end-to-end model designed to generate MIDI files from textual descriptions. This model utilizes pretrained large language models and an autoregressive transformer decoder to produce symbolic music based on detailed textual prompts, incorporating musical elements such as chords, tempo, and style.

Architecture

Text2midi leverages a combination of pretrained language models and a transformer-based architecture. This setup enables the generation of MIDI files that align with complex textual prompts. The model architecture includes components for encoding textual input and decoding it into symbolic music format.

Training

To train Text2midi, the use of the accelerate library is recommended for efficient multi-GPU support. The training process involves configuring accelerate and launching the training script with specified parameters including encoder and decoder models, dataset names, batch size, learning rate, and number of epochs.

accelerate config

accelerate launch train.py \
--encoder_model="google/flan-t5-large" \
--decoder_model="configs/transformer_decoder_config.json" \
--dataset_name="amaai-lab/MidiCaps" \
--pretrain_dataset="amaai-lab/SymphonyNet" \
--batch_size=16 \
--learning_rate=1e-4 \
--epochs=40 \

Guide: Running Locally

Basic Steps

  1. Clone the Repository:

    git clone https://github.com/AMAAI-Lab/text-2-midi
    cd text-2-midi
    
  2. Install Dependencies:

    • For CUDA-supported machines:
      pip install -r requirements.txt
      
    • For MPS-supported machines:
      pip install -r requirements-mac.txt
      
  3. Run Inference:
    Ensure you have the correct requirements installed based on your hardware. Use the following command to run inference:

    python model/transformer_model.py --caption <your intended descriptions>
    

Suggested Cloud GPUs

For optimal performance, consider utilizing cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure, which provide scalable GPU resources.

License

Text2midi is distributed under the Apache-2.0 license, allowing for widespread use and modification in both personal and commercial projects.

More Related APIs