musika_ae
marcopIntroduction
The Musika Audio Autoencoder is a pretrained universal model designed for fast, infinite waveform music generation. It is part of the Musika system and was introduced in a research paper.
Architecture
The Musika autoencoder is structured with two hierarchical stages that are independently trained. It encodes and reconstructs general 44.1 kHz waveform music, achieving a 4096x time compression ratio. For example, 23 seconds of 44.1 kHz audio are transformed into a sequence of 256 vectors, each with a dimension of 64.
Training
The autoencoder was trained using the SXSW dataset, a diverse music collection, and the VCTK dataset, a speech dataset, to create general representations suitable for various audio types.
Guide: Running Locally
- Setup: Ensure you have Python and the necessary dependencies installed. The model utilizes Keras and TensorFlow.
- Download: The autoencoder is automatically downloaded during the first execution of the Musika system.
- Run: Execute the system to start generating music.
- Cloud GPUs: For optimal performance, consider using cloud GPUs from providers like AWS or Google Cloud.
License
The Musika Audio Autoencoder is released under the MIT License, allowing for flexibility in its use and modification.