whisper medium
openaiIntroduction
Whisper is a pre-trained model developed by OpenAI for automatic speech recognition (ASR) and speech translation. It is trained on 680,000 hours of labeled data, showcasing strong generalization across various datasets and domains. The model is particularly effective for English speech recognition.
Architecture
Whisper employs a Transformer-based encoder-decoder architecture, functioning as a sequence-to-sequence model. It supports both English-only and multilingual configurations, with the ability to handle speech recognition and translation tasks. The model is available in five sizes, with varying parameters, ranging from tiny (39M) to large-v2 (1550M). All configurations are accessible on the Hugging Face Hub.
Training
Whisper was trained using large-scale weak supervision on diverse datasets. It includes 65% English-language data, 18% non-English audio with English transcripts, and 17% non-English audio with corresponding transcripts, covering 98 languages. The model's performance correlates with the amount of training data available per language.
Guide: Running Locally
-
Install Required Libraries: Ensure you have Python and PyTorch installed. Use
pip install transformers datasets
to install necessary packages. -
Load Model and Processor:
from transformers import WhisperProcessor, WhisperForConditionalGeneration processor = WhisperProcessor.from_pretrained("openai/whisper-medium") model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-medium")
-
Prepare Audio Input: Convert audio files to the required format and load them using the
datasets
library. -
Generate Transcription:
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features predicted_ids = model.generate(input_features) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
-
Cloud GPUs: For efficient processing, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
Whisper is released under the Apache-2.0 license, allowing for both personal and commercial use with attribution.