icefall_asr_tal csasr_pruned_transducer_stateless5
luomingshuangIntroduction
This document provides an overview of the pre-trained Transducer-Stateless5 models for the TAL_CSASR dataset using Icefall. The models were trained on the far data subset of TAL_CSASR leveraging scripts from the Icefall framework with the latest version of k2.
Architecture
The model utilizes a Transducer-Stateless5 architecture to perform speech recognition tasks on the TAL_CSASR dataset. It is part of the Icefall project, which is built upon the k2 framework for efficient and scalable speech processing.
Training
The training process involves the following key repositories:
- k2: k2-fsa/k2
- Icefall: k2-fsa/icefall
- Lhotse: lhotse-speech/lhotse
Steps for training include:
- Install k2 and Lhotse following their respective installation guides.
- Clone the Icefall repository and navigate to the relevant directory.
git clone https://github.com/k2-fsa/icefall
cd icefall
- Prepare the data using the provided script.
cd egs/tal_csasr/ASR
bash ./prepare.sh
- Execute the training script with CUDA support for GPU acceleration.
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5"
./pruned_transducer_stateless5/train.py \
--world-size 6 \
--num-epochs 30 \
--start-epoch 1 \
--exp-dir pruned_transducer_stateless5/exp \
--lang-dir data/lang_char \
--max-duration 90
Guide: Running Locally
To run the model locally, follow these steps:
- Install Dependencies: Ensure you have k2, Icefall, and Lhotse installed, along with their dependencies.
- Clone the Repositories: Get the Icefall repository and navigate to the project directory.
- Prepare Data: Use the provided script to set up the dataset.
- Train the Model: Configure and execute the training script as described above.
For optimal performance, it is recommended to use cloud GPU services such as AWS, Google Cloud, or Azure due to the computational intensity of training.
License
The code and models are shared under the Apache License 2.0, allowing free use, modification, and distribution. Please ensure compliance with the license terms when utilizing the resources.