whisper small hi

sanchit-gandhi

Introduction

The Whisper Small Hi model by Sanchit Gandhi is a fine-tuned variant of OpenAI's Whisper Small model, specifically optimized for Hindi language automatic speech recognition. It is trained on the Mozilla Foundation's Common Voice 11.0 dataset and achieves a Word Error Rate (WER) of 32.0113.

Architecture

This model is a derivative of the OpenAI Whisper Small architecture, which is designed for automatic speech recognition tasks. The exact architectural details of the Whisper Small model are not specified, but it is intended for handling speech recognition efficiently.

Training

Training Procedure

The model was fine-tuned using the following hyperparameters:

  • Learning Rate: 1e-05
  • Train Batch Size: 16
  • Eval Batch Size: 16
  • Seed: 42
  • Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • Learning Rate Scheduler Type: Linear
  • Learning Rate Scheduler Warmup Steps: 500
  • Training Steps: 5000
  • Mixed Precision Training: Native AMP

Training Results

  • Training Loss and Validation Loss were monitored over 5000 steps.
  • The final evaluation achieved a WER of 32.0113.

Framework Versions

  • Transformers: 4.25.0.dev0
  • PyTorch: 1.12.1
  • Datasets: 2.5.3.dev0
  • Tokenizers: 0.12.1

Guide: Running Locally

To run the Whisper Small Hi model locally, follow these basic steps:

  1. Install Dependencies: Ensure you have the necessary libraries installed, including PyTorch and Transformers. You can install them using pip:

    pip install torch transformers
    
  2. Load the Model: Use the Transformers library to load the model.

    from transformers import WhisperForConditionalGeneration, WhisperProcessor
    
    model = WhisperForConditionalGeneration.from_pretrained("sanchit-gandhi/whisper-small-hi")
    processor = WhisperProcessor.from_pretrained("sanchit-gandhi/whisper-small-hi")
    
  3. Prepare the Input: Process your input audio file.

    import torchaudio
    
    waveform, sample_rate = torchaudio.load("path_to_your_audio_file.wav")
    inputs = processor(waveform, sampling_rate=sample_rate, return_tensors="pt")
    
  4. Generate Transcriptions: Obtain transcriptions from the model.

    transcription = model.generate(inputs["input_values"])
    transcription_text = processor.batch_decode(transcription, skip_special_tokens=True)
    print(transcription_text)
    

Consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure to handle the computational load efficiently.

License

The Whisper Small Hi model is licensed under the Apache 2.0 License, allowing for both personal and commercial use, distribution, and modification.

More Related APIs in Automatic Speech Recognition