musicgen large

facebook

Introduction

MusicGen is a text-to-music model developed by Meta AI. It is capable of generating high-quality music samples based on text descriptions or audio prompts using a single-stage auto-regressive Transformer model. The model does not require a self-supervised semantic representation, making it distinct from other methods like MusicLM. MusicGen offers four checkpoints: small, medium, large, and melody.

Architecture

MusicGen employs an auto-regressive Transformer architecture trained on a 32kHz EnCodec tokenizer with four codebooks sampled at 50 Hz. By introducing a small delay between codebooks, MusicGen can predict them in parallel, reducing computational steps to 50 auto-regressive steps per second of audio.

Training

The model was trained between April and May 2023 using licensed data from sources like the Meta Music Initiative Sound Collection, Shutterstock music collection, and Pond5 music collection. The training set consists of 20K hours of data.

Guide: Running Locally

Using Transformers Library

  1. Install Requirements:

    pip install --upgrade pip
    pip install --upgrade transformers scipy
    
  2. Run Inference:

    from transformers import pipeline
    import scipy
    
    synthesiser = pipeline("text-to-audio", "facebook/musicgen-large")
    music = synthesiser("lo-fi music with a soothing melody", forward_params={"do_sample": True}")
    
    scipy.io.wavfile.write("musicgen_out.wav", rate=music["sampling_rate"], data=music["audio"])
    
  3. Listen to Audio:
    Use IPython's Audio or save as a .wav file with scipy.

Using Audiocraft Library

  1. Install Audiocraft Library:

    pip install git+https://github.com/facebookresearch/audiocraft.git
    
  2. Ensure FFmpeg is Installed:

    apt get install ffmpeg
    
  3. Run the Model:

    from audiocraft.models import MusicGen
    from audiocraft.data.audio import audio_write
    
    model = MusicGen.get_pretrained("large")
    model.set_generation_params(duration=8)
    
    descriptions = ["happy rock", "energetic EDM"]
    wav = model.generate(descriptions)
    
    for idx, one_wav in enumerate(wav):
        audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness")
    

Cloud GPUS

For optimal performance, consider running MusicGen on cloud GPUs via platforms that support it, such as Google Colab.

License

MusicGen's code is released under the MIT license, and model weights are under the CC-BY-NC 4.0 license.

More Related APIs in Text To Audio