Belle whisper large v3 turbo zh

BELLE-2

Introduction

BELLE-WHISPER-LARGE-V3-TURBO-ZH is a fine-tuned model for enhancing Chinese speech recognition capabilities. It is based on the OpenAI Whisper Large V3 Turbo model and exhibits a 24-64% relative improvement in performance over its predecessor on various Chinese ASR benchmarks, including AISHELL1, AISHELL2, WENETSPEECH, and HKUST. Punctuation marks in this model are sourced from the punc_ct-transformer_cn-en-common-vocab471067-large model.

Architecture

The model is built on the OpenAI Whisper Large V3 Turbo architecture, focusing on automatic speech recognition (ASR). It utilizes the Transformers library for leveraging pre-trained models and fine-tuning them for specific tasks.

Training

The model was fine-tuned at a 16kHz sampling rate using datasets such as AISHELL-1, AISHELL-2, WenetSpeech, and HKUST. Full fine-tuning was employed to achieve significant improvements in the character error rate (CER) compared to previous versions.

Guide: Running Locally

  1. Install Transformers Library
    Ensure you have the transformers library installed:

    pip install transformers
    
  2. Set Up the Transcriber
    Use the following code to set up the transcriber:

    from transformers import pipeline
    
    transcriber = pipeline(
      "automatic-speech-recognition", 
      model="BELLE-2/Belle-whisper-large-v3-turbo-zh"
    )
    
    transcriber.model.config.forced_decoder_ids = (
      transcriber.tokenizer.get_decoder_prompt_ids(
        language="zh", 
        task="transcribe"
      )
    )
    
    transcription = transcriber("my_audio.wav")
    
  3. Cloud GPU Recommendation
    For optimal performance, consider using cloud GPUs available from providers like AWS, Google Cloud, or Azure to handle the computational demands of the model.

License

This model is licensed under the Apache-2.0 License, allowing for both commercial and private use while requiring that any modifications are documented.

More Related APIs in Automatic Speech Recognition