Belle whisper large v3 turbo zh
BELLE-2Introduction
BELLE-WHISPER-LARGE-V3-TURBO-ZH is a fine-tuned model for enhancing Chinese speech recognition capabilities. It is based on the OpenAI Whisper Large V3 Turbo model and exhibits a 24-64% relative improvement in performance over its predecessor on various Chinese ASR benchmarks, including AISHELL1, AISHELL2, WENETSPEECH, and HKUST. Punctuation marks in this model are sourced from the punc_ct-transformer_cn-en-common-vocab471067-large model.
Architecture
The model is built on the OpenAI Whisper Large V3 Turbo architecture, focusing on automatic speech recognition (ASR). It utilizes the Transformers library for leveraging pre-trained models and fine-tuning them for specific tasks.
Training
The model was fine-tuned at a 16kHz sampling rate using datasets such as AISHELL-1, AISHELL-2, WenetSpeech, and HKUST. Full fine-tuning was employed to achieve significant improvements in the character error rate (CER) compared to previous versions.
Guide: Running Locally
-
Install Transformers Library
Ensure you have thetransformers
library installed:pip install transformers
-
Set Up the Transcriber
Use the following code to set up the transcriber:from transformers import pipeline transcriber = pipeline( "automatic-speech-recognition", model="BELLE-2/Belle-whisper-large-v3-turbo-zh" ) transcriber.model.config.forced_decoder_ids = ( transcriber.tokenizer.get_decoder_prompt_ids( language="zh", task="transcribe" ) ) transcription = transcriber("my_audio.wav")
-
Cloud GPU Recommendation
For optimal performance, consider using cloud GPUs available from providers like AWS, Google Cloud, or Azure to handle the computational demands of the model.
License
This model is licensed under the Apache-2.0 License, allowing for both commercial and private use while requiring that any modifications are documented.