Dereverb Echo_ Mel_ Band_ Roformer
SucialIntroduction
The Dereverb-Echo Mel Band Roformer models are designed to separate reverb and delay effects in vocal tracks, with the additional capability of removing harmonies. Due to dataset modifications, the model's handling of high frequencies is not overly aggressive.
Architecture
The models include several specialized configurations:
- Fused Models: A combination of three models to handle small and large reverb effects.
- Big Reverb Models: Focused on removing larger reverb effects.
- V2 Models: Finetuned with over 1000 songs for improved performance.
- V1 Models: An earlier version with a different configuration and dataset basis.
Training
- Datasets: Utilized 270 songs from the opencpop and GTSinger datasets, with additional validation on 30 songs from a private collection.
- Process: Random reverbs and delay effects were generated using a custom Python script and formatted into the mustb18 dataset.
- Finetuning: The models were finetuned from earlier versions to enhance separation performance.
Guide: Running Locally
- Environment Setup: Clone the repository and install necessary dependencies.
- Configuration: Use the provided YAML configuration files to adjust model parameters as required.
- Execution: Run the model fusion script or individual model files to process audio.
- Hardware Suggestions: For optimal performance, consider using cloud-based GPUs such as AWS EC2, Google Cloud, or Azure.
License
The models and associated content are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (cc-by-nc-sa-4.0). This allows for sharing and adaptation under similar licensing terms, but not for commercial use.