whisper large v3 mlx

mlx-community

Introduction

The whisper-large-v3-mlx model is part of the MLX community's offerings on Hugging Face, designed for efficient speech transcription using the mlx-whisper library.

Architecture

The model is built within the MLX library framework, leveraging large-scale architectures to facilitate accurate speech-to-text capabilities.

Training

Specific details on the model's training process are not provided in the README. However, it typically involves fine-tuning pre-trained models on extensive speech datasets to enhance transcription accuracy.

Guide: Running Locally

To use the whisper-large-v3-mlx model:

  1. Install MLX-Whisper:
    Ensure you have Python installed, then execute:

    pip install mlx-whisper
    
  2. Transcribe Speech:
    Utilize the following code snippet to transcribe speech files:

    import mlx_whisper
    
    result = mlx_whisper.transcribe(
      speech_file,
      path_or_hf_repo="mlx-community/whisper-large-v3-mlx")
    
  3. Hardware Suggestions:
    For optimal performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.

License

The model is licensed under the MIT License, allowing for extensive reuse and modification with appropriate credit.

More Related APIs