faster whisper large v3

Systran

Introduction

The Faster-Whisper-Large-V3 model by SYSTRAN is a conversion of the OpenAI Whisper Large-V3 model into a format compatible with CTranslate2. It supports automatic speech recognition for 100 languages.

Architecture

This model has been converted to the CTranslate2 format, allowing it to be used in applications that leverage this library, such as SYSTRAN's Faster-Whisper project.

Training

The model's weights are stored in FP16 (16-bit floating point) format. The conversion process involved the command ct2-transformers-converter with options for output directory and file copying, and it supports quantization to float16.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install the required library: Ensure you have CTranslate2 and Faster-Whisper installed.
  2. Load the model: Use the provided Python code to load and run the model.
    from faster_whisper import WhisperModel
    
    model = WhisperModel("large-v3")
    
    segments, info = model.transcribe("audio.mp3")
    for segment in segments:
        print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
    
  3. Transcribe audio: Replace "audio.mp3" with your audio file's path to transcribe it.

For improved performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

The Faster-Whisper-Large-V3 model is released under the MIT license, allowing for broad use and modification.

More Related APIs in Automatic Speech Recognition