musicgen small
facebookIntroduction
MusicGen is a text-to-music model developed by Meta AI's FAIR team. It generates high-quality music samples based on text descriptions or audio prompts using an auto-regressive Transformer model. MusicGen's innovative approach allows for efficient parallel prediction of audio codebooks.
Architecture
MusicGen employs a single-stage auto-regressive Transformer model trained with a 32kHz EnCodec tokenizer and uses four codebooks sampled at 50 Hz. The model is designed to predict these codebooks in parallel, reducing the computational steps required per second of audio.
Training
The model was trained on licensed data from the Meta Music Initiative Sound Collection, Shutterstock, and Pond5, encompassing 20,000 hours of music. The training did not include vocals as they were removed using state-of-the-art music source separation methods.
Guide: Running Locally
Steps
-
Install Required Libraries: Ensure you have the latest version of the 🤗 Transformers library and scipy installed.
pip install --upgrade pip pip install --upgrade transformers scipy
-
Run Inference with Text-to-Audio Pipeline:
from transformers import pipeline import scipy synthesiser = pipeline("text-to-audio", "facebook/musicgen-small") music = synthesiser("lo-fi music with a soothing melody", forward_params={"do_sample": True}) scipy.io.wavfile.write("musicgen_out.wav", rate=music["sampling_rate"], data=music["audio"])
-
Use Transformers Modeling Code for More Control:
from transformers import AutoProcessor, MusicgenForConditionalGeneration processor = AutoProcessor.from_pretrained("facebook/musicgen-small") model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small") inputs = processor( text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"], padding=True, return_tensors="pt", ) audio_values = model.generate(**inputs, max_new_tokens=256)
-
Listen or Save the Generated Audio:
from IPython.display import Audio import scipy sampling_rate = model.config.audio_encoder.sampling_rate Audio(audio_values[0].numpy(), rate=sampling_rate) scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())
Cloud GPUs
For better performance, consider using cloud GPU services such as Google Colab or AWS.
License
The MusicGen model weights are released under the Creative Commons BY-NC 4.0 license, while the code is under the MIT license. This means the model is free to use for non-commercial purposes with appropriate credit.