musicgen large
facebookIntroduction
MusicGen is a text-to-music model developed by Meta AI. It is capable of generating high-quality music samples based on text descriptions or audio prompts using a single-stage auto-regressive Transformer model. The model does not require a self-supervised semantic representation, making it distinct from other methods like MusicLM. MusicGen offers four checkpoints: small, medium, large, and melody.
Architecture
MusicGen employs an auto-regressive Transformer architecture trained on a 32kHz EnCodec tokenizer with four codebooks sampled at 50 Hz. By introducing a small delay between codebooks, MusicGen can predict them in parallel, reducing computational steps to 50 auto-regressive steps per second of audio.
Training
The model was trained between April and May 2023 using licensed data from sources like the Meta Music Initiative Sound Collection, Shutterstock music collection, and Pond5 music collection. The training set consists of 20K hours of data.
Guide: Running Locally
Using Transformers Library
-
Install Requirements:
pip install --upgrade pip pip install --upgrade transformers scipy
-
Run Inference:
from transformers import pipeline import scipy synthesiser = pipeline("text-to-audio", "facebook/musicgen-large") music = synthesiser("lo-fi music with a soothing melody", forward_params={"do_sample": True}") scipy.io.wavfile.write("musicgen_out.wav", rate=music["sampling_rate"], data=music["audio"])
-
Listen to Audio:
Use IPython'sAudio
or save as a.wav
file withscipy
.
Using Audiocraft Library
-
Install Audiocraft Library:
pip install git+https://github.com/facebookresearch/audiocraft.git
-
Ensure FFmpeg is Installed:
apt get install ffmpeg
-
Run the Model:
from audiocraft.models import MusicGen from audiocraft.data.audio import audio_write model = MusicGen.get_pretrained("large") model.set_generation_params(duration=8) descriptions = ["happy rock", "energetic EDM"] wav = model.generate(descriptions) for idx, one_wav in enumerate(wav): audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness")
Cloud GPUS
For optimal performance, consider running MusicGen on cloud GPUs via platforms that support it, such as Google Colab.
License
MusicGen's code is released under the MIT license, and model weights are under the CC-BY-NC 4.0 license.