whisper large v3 turbo german

primeline

Introduction

The Whisper Large V3 Turbo German model is a speech recognition system optimized for recognizing and processing spoken German. Developed using the Whisper platform by OpenAI, it offers solutions for transcription, voice commands, automatic subtitling, voice-based search queries, and dictation functions.

Architecture

This model belongs to a family of Whisper models, including Whisper Large V3 German and Distil-Whisper Large V3 German, with varying parameters to suit different needs. Whisper Large V3 Turbo German has 809M parameters, providing a balance between accuracy and computational efficiency.

Training

The model was trained on a diverse dataset of spoken German, known as the German ASR Data-Mix. It achieved a Word Error Rate (WER) of 2.628% on test data, indicating high accuracy. The training process utilized the following hyperparameters:

  • Batch size: 12288
  • Epochs: 3
  • Learning rate: 1e-6
  • Optimizer: Ademamix

Guide: Running Locally

  1. Installation: Ensure you have Python and PyTorch installed. Install the transformers and datasets libraries from Hugging Face.

  2. Load Model and Processor:

    from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
    import torch
    
    device = "cuda:0" if torch.cuda.is_available() else "cpu"
    model_id = "primeline/whisper-large-v3-turbo-german"
    
    model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
    processor = AutoProcessor.from_pretrained(model_id)
    
  3. Set Up the Pipeline:

    pipe = pipeline(
        "automatic-speech-recognition",
        model=model,
        tokenizer=processor.tokenizer,
        feature_extractor=processor.feature_extractor,
        device=device
    )
    
  4. Inference: Load a German audio sample and run the model to get transcriptions.

  5. Cloud GPUs: For optimal performance, use cloud GPU services such as AWS, Google Cloud, or Azure to run the model.

License

The Whisper Large V3 Turbo German model is available under the Apache-2.0 License, allowing for broad usage with minimal restrictions.

More Related APIs in Automatic Speech Recognition