bigvgan_v2_22khz_80band_256x
nvidiaIntroduction
BIGVGAN is a universal neural vocoder developed by NVIDIA, designed for large-scale training on diverse audio inputs. It employs a neural network to generate high-quality audio waveforms from mel spectrograms. The model leverages PyTorch and offers optimized CUDA kernels for accelerated inference.
Architecture
BIGVGAN utilizes a custom CUDA kernel that integrates upsampling, activation, and downsampling operations. This architecture enables efficient audio generation with improved inference speeds, especially when executed on compatible NVIDIA GPUs.
Training
The model has undergone extensive training using a wide array of audio datasets. It incorporates a multi-scale sub-band CQT discriminator and multi-scale mel spectrogram loss to enhance audio quality. BIGVGAN-v2 supports higher sampling rates and upsampling ratios, catering to various audio generation requirements.
Guide: Running Locally
-
Prerequisites: Ensure
git lfs
is installed. Clone the repository using:git lfs install git clone https://huggingface.co/nvidia/bigvgan_v2_22khz_80band_256x
-
Setup Environment: Install necessary libraries, such as PyTorch and any CUDA dependencies if you plan to use GPU acceleration.
-
Load Pretrained Model:
import torch import bigvgan device = 'cuda' model = bigvgan.BigVGAN.from_pretrained('nvidia/bigvgan_v2_22khz_80band_256x', use_cuda_kernel=True) model.remove_weight_norm() model = model.eval().to(device)
-
Inference Pipeline: Load your audio file, compute its mel spectrogram, and generate audio:
import librosa from meldataset import get_mel_spectrogram wav_path = '/path/to/your/audio.wav' wav, sr = librosa.load(wav_path, sr=model.h.sampling_rate, mono=True) wav = torch.FloatTensor(wav).unsqueeze(0) mel = get_mel_spectrogram(wav, model.h).to(device) with torch.inference_mode(): wav_gen = model(mel)
-
Recommended Cloud GPUs: For optimal performance and speed, consider using cloud GPU services such as NVIDIA A100, which can significantly enhance processing speed.
License
BIGVGAN is released under the MIT License. The full license text can be accessed here. This permissive license allows for modification, distribution, and private use, subject to the terms and conditions outlined.