whisper small hi
sanchit-gandhiIntroduction
The Whisper Small Hi model by Sanchit Gandhi is a fine-tuned variant of OpenAI's Whisper Small model, specifically optimized for Hindi language automatic speech recognition. It is trained on the Mozilla Foundation's Common Voice 11.0 dataset and achieves a Word Error Rate (WER) of 32.0113.
Architecture
This model is a derivative of the OpenAI Whisper Small architecture, which is designed for automatic speech recognition tasks. The exact architectural details of the Whisper Small model are not specified, but it is intended for handling speech recognition efficiently.
Training
Training Procedure
The model was fine-tuned using the following hyperparameters:
- Learning Rate: 1e-05
- Train Batch Size: 16
- Eval Batch Size: 16
- Seed: 42
- Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- Learning Rate Scheduler Type: Linear
- Learning Rate Scheduler Warmup Steps: 500
- Training Steps: 5000
- Mixed Precision Training: Native AMP
Training Results
- Training Loss and Validation Loss were monitored over 5000 steps.
- The final evaluation achieved a WER of 32.0113.
Framework Versions
- Transformers: 4.25.0.dev0
- PyTorch: 1.12.1
- Datasets: 2.5.3.dev0
- Tokenizers: 0.12.1
Guide: Running Locally
To run the Whisper Small Hi model locally, follow these basic steps:
-
Install Dependencies: Ensure you have the necessary libraries installed, including PyTorch and Transformers. You can install them using pip:
pip install torch transformers
-
Load the Model: Use the Transformers library to load the model.
from transformers import WhisperForConditionalGeneration, WhisperProcessor model = WhisperForConditionalGeneration.from_pretrained("sanchit-gandhi/whisper-small-hi") processor = WhisperProcessor.from_pretrained("sanchit-gandhi/whisper-small-hi")
-
Prepare the Input: Process your input audio file.
import torchaudio waveform, sample_rate = torchaudio.load("path_to_your_audio_file.wav") inputs = processor(waveform, sampling_rate=sample_rate, return_tensors="pt")
-
Generate Transcriptions: Obtain transcriptions from the model.
transcription = model.generate(inputs["input_values"]) transcription_text = processor.batch_decode(transcription, skip_special_tokens=True) print(transcription_text)
Consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure to handle the computational load efficiently.
License
The Whisper Small Hi model is licensed under the Apache 2.0 License, allowing for both personal and commercial use, distribution, and modification.