icefall_asr_tal csasr_pruned_transducer_stateless5

luomingshuang

Introduction

This document provides an overview of the pre-trained Transducer-Stateless5 models for the TAL_CSASR dataset using Icefall. The models were trained on the far data subset of TAL_CSASR leveraging scripts from the Icefall framework with the latest version of k2.

Architecture

The model utilizes a Transducer-Stateless5 architecture to perform speech recognition tasks on the TAL_CSASR dataset. It is part of the Icefall project, which is built upon the k2 framework for efficient and scalable speech processing.

Training

The training process involves the following key repositories:

Steps for training include:

  1. Install k2 and Lhotse following their respective installation guides.
  2. Clone the Icefall repository and navigate to the relevant directory.
git clone https://github.com/k2-fsa/icefall
cd icefall
  1. Prepare the data using the provided script.
cd egs/tal_csasr/ASR
bash ./prepare.sh
  1. Execute the training script with CUDA support for GPU acceleration.
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5"
./pruned_transducer_stateless5/train.py \
    --world-size 6 \
    --num-epochs 30 \
    --start-epoch 1 \
    --exp-dir pruned_transducer_stateless5/exp \
    --lang-dir data/lang_char \
    --max-duration 90

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Dependencies: Ensure you have k2, Icefall, and Lhotse installed, along with their dependencies.
  2. Clone the Repositories: Get the Icefall repository and navigate to the project directory.
  3. Prepare Data: Use the provided script to set up the dataset.
  4. Train the Model: Configure and execute the training script as described above.

For optimal performance, it is recommended to use cloud GPU services such as AWS, Google Cloud, or Azure due to the computational intensity of training.

License

The code and models are shared under the Apache License 2.0, allowing free use, modification, and distribution. Please ensure compliance with the license terms when utilizing the resources.

More Related APIs