asr transformer aishell
speechbrainIntroduction
The ASR-Transformer-AISHELL is an end-to-end automatic speech recognition (ASR) system for Mandarin Chinese, developed as part of the SpeechBrain project. It utilizes components such as tokenization and acoustic modeling with transformer architectures to perform speech-to-text tasks.
Architecture
The ASR system consists of two main components:
- A Tokenizer that converts words into subword units using a unigram model trained on LibriSpeech transcriptions.
- An Acoustic Model combining a transformer encoder with a joint decoder integrating Connectionist Temporal Classification (CTC) and transformer-based decoding.
The system processes audio sampled at 16kHz and is capable of normalizing audio inputs automatically.
Training
The model was trained using SpeechBrain with specific configurations and can be retrained from scratch:
- Clone the SpeechBrain repository:
git clone https://github.com/speechbrain/speechbrain/
- Install dependencies:
cd speechbrain pip install -r requirements.txt pip install -e .
- Execute the training script:
cd recipes/AISHELL-1/ASR/transformer/ python train.py hparams/train_ASR_transformer.yaml --data_folder=your_data_folder
Training data and results are available on Google Drive.
Guide: Running Locally
- Install SpeechBrain:
pip install speechbrain
- Transcribe Audio Files:
from speechbrain.inference.ASR import EncoderDecoderASR asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-transformer-aishell", savedir="pretrained_models/asr-transformer-aishell") asr_model.transcribe_file("your_audio_file.flac")
- Inference on GPU:
Add
run_opts={"device":"cuda"}
to use GPU during inference:asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-transformer-aishell", savedir="pretrained_models/asr-transformer-aishell", run_opts={"device":"cuda"})
For enhanced performance, consider using cloud GPUs such as AWS EC2 with NVIDIA V100 or A100 instances.
License
The ASR-Transformer-AISHELL is licensed under the Apache 2.0 license, allowing for broad use and modification within the terms specified.