slu direct fluent speech commands librispeech asr
speechbrainIntroduction
The SLU-DIRECT-FLUENT-SPEECH-COMMANDS-LIBRISPEECH-ASR model is an end-to-end spoken language understanding (SLU) system developed by SpeechBrain. It focuses on interpreting spoken commands for smart-home devices using the Fluent Speech Commands dataset. The model achieves high accuracy by leveraging an attention-based RNN sequence-to-sequence architecture.
Architecture
The model employs an attention-based RNN sequence-to-sequence approach trained on the Fluent Speech Commands dataset. It uses an ASR model, specifically the speechbrain/asr-crdnn-rnnlm-librispeech
, to extract features from audio inputs, which are then mapped to intents and slot labels using beam search. This architecture allows the model to achieve 99.6% accuracy on the test set.
Training
The model is trained using the SpeechBrain toolkit, with recordings sampled at 16kHz. Training involves:
- Cloning the SpeechBrain repository.
- Installing dependencies.
- Executing the training script located in the
recipes/fluent-speech-commands
directory.
Training results, including models and logs, are available on a shared Google Drive for further analysis and reference.
Guide: Running Locally
To run this model locally:
- Clone the SpeechBrain repository:
git clone https://github.com/speechbrain/speechbrain/
- Install dependencies:
cd speechbrain pip install -r requirements.txt pip install -e .
- Run Inference:
from speechbrain.inference.SLU import EndToEndSLU slu = EndToEndSLU.from_hparams("speechbrain/slu-direct-fluent-speech-commands-librispeech-asr") slu.decode_file("path_to_your_audio_file.wav")
- GPU Usage: For GPU inference, add
run_opts={"device":"cuda"}
to thefrom_hparams
call.
Cloud GPUs
For enhanced performance, consider using cloud GPUs available through services like AWS, Google Cloud, or Azure.
License
The SLU-DIRECT-FLUENT-SPEECH-COMMANDS-LIBRISPEECH-ASR model is licensed under the CC0-1.0 license, allowing for free use without restrictions.