faster whisper large v3 turbo ct2

deepdml

Introduction

The FASTER-WHISPER-LARGE-V3-TURBO-CT2 model is an automatic speech recognition model supporting 100 languages. It is designed for use with CTranslate2 and similar projects, such as faster-whisper. The model is available under the MIT license.

Architecture

The model is a conversion of the deepdml/whisper-large-v3-turbo into the CTranslate2 format, allowing for optimized performance in applications using CTranslate2. This conversion supports audio processing and automatic speech recognition tasks.

Training

The model's weights are stored in FP16 format for optimized performance. Users can adjust the compute type during loading using the compute_type option in CTranslate2 to suit specific needs.

Guide: Running Locally

  1. Install Required Libraries: Ensure you have CTranslate2 and faster-whisper installed.
  2. Load the Model: Use the following example to load and run the model:
    from faster_whisper import WhisperModel
    
    model = WhisperModel("deepdml/faster-whisper-large-v3-turbo-ct2")
    segments, info = model.transcribe("audio.mp3")
    for segment in segments:
        print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
    
  3. Run on Cloud GPUs: For optimal performance, consider using cloud platforms like AWS, GCP, or Azure that provide GPU support.

License

This model is released under the MIT License, allowing for flexible use and distribution in various projects.

More Related APIs in Automatic Speech Recognition