whisper large v3 turbo german
primelineIntroduction
The Whisper Large V3 Turbo German model is a speech recognition system optimized for recognizing and processing spoken German. Developed using the Whisper platform by OpenAI, it offers solutions for transcription, voice commands, automatic subtitling, voice-based search queries, and dictation functions.
Architecture
This model belongs to a family of Whisper models, including Whisper Large V3 German and Distil-Whisper Large V3 German, with varying parameters to suit different needs. Whisper Large V3 Turbo German has 809M parameters, providing a balance between accuracy and computational efficiency.
Training
The model was trained on a diverse dataset of spoken German, known as the German ASR Data-Mix. It achieved a Word Error Rate (WER) of 2.628% on test data, indicating high accuracy. The training process utilized the following hyperparameters:
- Batch size: 12288
- Epochs: 3
- Learning rate: 1e-6
- Optimizer: Ademamix
Guide: Running Locally
-
Installation: Ensure you have Python and PyTorch installed. Install the
transformers
anddatasets
libraries from Hugging Face. -
Load Model and Processor:
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline import torch device = "cuda:0" if torch.cuda.is_available() else "cpu" model_id = "primeline/whisper-large-v3-turbo-german" model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device) processor = AutoProcessor.from_pretrained(model_id)
-
Set Up the Pipeline:
pipe = pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=device )
-
Inference: Load a German audio sample and run the model to get transcriptions.
-
Cloud GPUs: For optimal performance, use cloud GPU services such as AWS, Google Cloud, or Azure to run the model.
License
The Whisper Large V3 Turbo German model is available under the Apache-2.0 License, allowing for broad usage with minimal restrictions.