bark small
sunoIntroduction
Bark is a transformer-based text-to-audio model developed by Suno. It is designed to generate realistic, multilingual speech and other audio types, including music and sound effects. The model can also produce nonverbal cues like laughter and sighs. Bark is available in two pretrained versions: bark-small
and bark-large
, both intended for research purposes. The model outputs are uncensored and should be used with caution.
Architecture
Bark consists of three main transformer models that convert text into audio:
- Text to semantic tokens: Converts text input into semantic tokens using a BERT tokenizer.
- Semantic to coarse tokens: Transforms semantic tokens into coarse tokens using the EnCodec Codec.
- Coarse to fine tokens: Further refines the coarse tokens into fine tokens with more codebooks from EnCodec.
The models have parameters ranging from 80 to 300 million, using both causal and non-causal attention mechanisms.
Training
The model's release includes pretrained checkpoints that are ready for inference. No additional training details are provided in the summary content.
Guide: Running Locally
Using 🤗 Transformers
-
Install Libraries:
pip install --upgrade pip pip install --upgrade transformers scipy
-
Inference with TTS Pipeline:
from transformers import pipeline import scipy synthesiser = pipeline("text-to-speech", "suno/bark-small") speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True}) scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])
-
Inference with Transformers Modelling Code:
from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("suno/bark-small") model = AutoModel.from_pretrained("suno/bark-small") inputs = processor(text=["Hello, my name is Suno..."], return_tensors="pt") speech_values = model.generate(**inputs, do_sample=True)
-
Play or Save Audio:
Use IPython to play orscipy
to save as a WAV file.
Using Suno's Original Bark Library
-
Install Library:
Follow instructions at Suno's GitHub. -
Generate Audio:
from bark import SAMPLE_RATE, generate_audio, preload_models from IPython.display import Audio preload_models() text_prompt = "Hello, my name is Suno..." speech_array = generate_audio(text_prompt) Audio(speech_array, rate=SAMPLE_RATE)
Cloud GPU Suggestion
For better performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.
License
Bark is licensed under the MIT License, allowing commercial use. For further details, refer to the license file.