slu direct fluent speech commands librispeech asr

speechbrain

Introduction

The SLU-DIRECT-FLUENT-SPEECH-COMMANDS-LIBRISPEECH-ASR model is an end-to-end spoken language understanding (SLU) system developed by SpeechBrain. It focuses on interpreting spoken commands for smart-home devices using the Fluent Speech Commands dataset. The model achieves high accuracy by leveraging an attention-based RNN sequence-to-sequence architecture.

Architecture

The model employs an attention-based RNN sequence-to-sequence approach trained on the Fluent Speech Commands dataset. It uses an ASR model, specifically the speechbrain/asr-crdnn-rnnlm-librispeech, to extract features from audio inputs, which are then mapped to intents and slot labels using beam search. This architecture allows the model to achieve 99.6% accuracy on the test set.

Training

The model is trained using the SpeechBrain toolkit, with recordings sampled at 16kHz. Training involves:

  1. Cloning the SpeechBrain repository.
  2. Installing dependencies.
  3. Executing the training script located in the recipes/fluent-speech-commands directory.

Training results, including models and logs, are available on a shared Google Drive for further analysis and reference.

Guide: Running Locally

To run this model locally:

  1. Clone the SpeechBrain repository:
    git clone https://github.com/speechbrain/speechbrain/
    
  2. Install dependencies:
    cd speechbrain
    pip install -r requirements.txt
    pip install -e .
    
  3. Run Inference:
    from speechbrain.inference.SLU import EndToEndSLU
    slu = EndToEndSLU.from_hparams("speechbrain/slu-direct-fluent-speech-commands-librispeech-asr")
    slu.decode_file("path_to_your_audio_file.wav")
    
  4. GPU Usage: For GPU inference, add run_opts={"device":"cuda"} to the from_hparams call.

Cloud GPUs

For enhanced performance, consider using cloud GPUs available through services like AWS, Google Cloud, or Azure.

License

The SLU-DIRECT-FLUENT-SPEECH-COMMANDS-LIBRISPEECH-ASR model is licensed under the CC0-1.0 license, allowing for free use without restrictions.

More Related APIs