bark small

suno

Introduction

Bark is a transformer-based text-to-audio model developed by Suno. It is designed to generate realistic, multilingual speech and other audio types, including music and sound effects. The model can also produce nonverbal cues like laughter and sighs. Bark is available in two pretrained versions: bark-small and bark-large, both intended for research purposes. The model outputs are uncensored and should be used with caution.

Architecture

Bark consists of three main transformer models that convert text into audio:

  • Text to semantic tokens: Converts text input into semantic tokens using a BERT tokenizer.
  • Semantic to coarse tokens: Transforms semantic tokens into coarse tokens using the EnCodec Codec.
  • Coarse to fine tokens: Further refines the coarse tokens into fine tokens with more codebooks from EnCodec.

The models have parameters ranging from 80 to 300 million, using both causal and non-causal attention mechanisms.

Training

The model's release includes pretrained checkpoints that are ready for inference. No additional training details are provided in the summary content.

Guide: Running Locally

Using 🤗 Transformers

  1. Install Libraries:

    pip install --upgrade pip
    pip install --upgrade transformers scipy
    
  2. Inference with TTS Pipeline:

    from transformers import pipeline
    import scipy
    
    synthesiser = pipeline("text-to-speech", "suno/bark-small")
    speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True})
    scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])
    
  3. Inference with Transformers Modelling Code:

    from transformers import AutoProcessor, AutoModel
    
    processor = AutoProcessor.from_pretrained("suno/bark-small")
    model = AutoModel.from_pretrained("suno/bark-small")
    
    inputs = processor(text=["Hello, my name is Suno..."], return_tensors="pt")
    speech_values = model.generate(**inputs, do_sample=True)
    
  4. Play or Save Audio:
    Use IPython to play or scipy to save as a WAV file.

Using Suno's Original Bark Library

  1. Install Library:
    Follow instructions at Suno's GitHub.

  2. Generate Audio:

    from bark import SAMPLE_RATE, generate_audio, preload_models
    from IPython.display import Audio
    
    preload_models()
    text_prompt = "Hello, my name is Suno..."
    speech_array = generate_audio(text_prompt)
    Audio(speech_array, rate=SAMPLE_RATE)
    

Cloud GPU Suggestion

For better performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.

License

Bark is licensed under the MIT License, allowing commercial use. For further details, refer to the license file.

More Related APIs in Text To Speech