bigvgan_v2_44khz_128band_512x
nvidiaIntroduction
BIGVGAN is a universal neural vocoder designed for large-scale training. Developed by Sang-Gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, and Sungroh Yoon, it focuses on enhancing audio generation through neural vocoding techniques. The model supports various audio configurations, allowing for a wide range of applications in audio-to-audio transformations.
Architecture
BIGVGAN utilizes a sophisticated architecture that includes a custom CUDA kernel for accelerated inference. This architecture allows for faster processing, particularly on GPUs like NVIDIA's A100. The model is trained using a multi-scale sub-band CQT discriminator and a multi-scale mel spectrogram loss, making it highly efficient for audio generation tasks.
Training
BIGVGAN is trained on large datasets that encompass diverse audio types, including multilingual speech, environmental sounds, and musical instruments. This extensive training enables the model to support up to a 44 kHz sampling rate and a 512x upsampling ratio. The training incorporates a large-scale compilation of data to achieve high-quality audio synthesis.
Guide: Running Locally
-
Installation
- Ensure Git LFS is installed:
git lfs install
. - Clone the repository:
git clone https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_512x
- Ensure Git LFS is installed:
-
Usage
- Load the pretrained model, compute the mel spectrogram from an input waveform, and generate the synthesized waveform.
- Example code snippet:
import torch import bigvgan import librosa from meldataset import get_mel_spectrogram device = 'cuda' model = bigvgan.BigVGAN.from_pretrained('nvidia/bigvgan_v2_44khz_128band_512x', use_cuda_kernel=False) model.remove_weight_norm() model = model.eval().to(device) wav_path = '/path/to/your/audio.wav' wav, sr = librosa.load(wav_path, sr=model.h.sampling_rate, mono=True) wav = torch.FloatTensor(wav).unsqueeze(0) mel = get_mel_spectrogram(wav, model.h).to(device) with torch.inference_mode(): wav_gen = model(mel)
-
Using Custom CUDA Kernel
- For faster synthesis, enable the custom CUDA kernel:
model = bigvgan.BigVGAN.from_pretrained('nvidia/bigvgan_v2_44khz_128band_512x', use_cuda_kernel=True)
- For faster synthesis, enable the custom CUDA kernel:
-
Suggested Cloud GPUs
- Consider using NVIDIA A100 or similar GPUs for optimal performance.
License
BIGVGAN is licensed under the MIT License. You can find the license details at this link.