musicgen small

facebook

Introduction

MusicGen is a text-to-music model developed by Meta AI's FAIR team. It generates high-quality music samples based on text descriptions or audio prompts using an auto-regressive Transformer model. MusicGen's innovative approach allows for efficient parallel prediction of audio codebooks.

Architecture

MusicGen employs a single-stage auto-regressive Transformer model trained with a 32kHz EnCodec tokenizer and uses four codebooks sampled at 50 Hz. The model is designed to predict these codebooks in parallel, reducing the computational steps required per second of audio.

Training

The model was trained on licensed data from the Meta Music Initiative Sound Collection, Shutterstock, and Pond5, encompassing 20,000 hours of music. The training did not include vocals as they were removed using state-of-the-art music source separation methods.

Guide: Running Locally

Steps

  1. Install Required Libraries: Ensure you have the latest version of the 🤗 Transformers library and scipy installed.

    pip install --upgrade pip
    pip install --upgrade transformers scipy
    
  2. Run Inference with Text-to-Audio Pipeline:

    from transformers import pipeline
    import scipy
    
    synthesiser = pipeline("text-to-audio", "facebook/musicgen-small")
    music = synthesiser("lo-fi music with a soothing melody", forward_params={"do_sample": True})
    scipy.io.wavfile.write("musicgen_out.wav", rate=music["sampling_rate"], data=music["audio"])
    
  3. Use Transformers Modeling Code for More Control:

    from transformers import AutoProcessor, MusicgenForConditionalGeneration
    
    processor = AutoProcessor.from_pretrained("facebook/musicgen-small")
    model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
    
    inputs = processor(
        text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
        padding=True,
        return_tensors="pt",
    )
    
    audio_values = model.generate(**inputs, max_new_tokens=256)
    
  4. Listen or Save the Generated Audio:

    from IPython.display import Audio
    import scipy
    
    sampling_rate = model.config.audio_encoder.sampling_rate
    Audio(audio_values[0].numpy(), rate=sampling_rate)
    scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())
    

Cloud GPUs

For better performance, consider using cloud GPU services such as Google Colab or AWS.

License

The MusicGen model weights are released under the Creative Commons BY-NC 4.0 license, while the code is under the MIT license. This means the model is free to use for non-commercial purposes with appropriate credit.

More Related APIs in Text To Audio