Introduction

Moonshine is a model developed by Useful Sensors for automatic speech recognition (ASR), specifically designed to transcribe English speech into text. It aims to enable real-time transcription on low-cost hardware. The model card provides details on the model’s architecture, training, and intended usage.

Architecture

Moonshine is a sequence-to-sequence ASR and speech translation model. It has two variants: a tiny model with 27 million parameters and a base model with 61 million parameters. Both models support English transcription, with the base model also capable of multilingual tasks.

Training

The Moonshine models were trained on 200,000 hours of audio data and associated transcripts sourced from the internet and public datasets available on Hugging Face. The models have been optimized for platforms with limited memory and computational resources. Evaluation shows improved accuracy over similar ASR systems, though challenges like text hallucination and repetitive output persist.

Guide: Running Locally

  1. Install uv for environment management: Follow the installation guide.
  2. Set up and activate a virtual environment:
    uv venv env_moonshine
    source env_moonshine/bin/activate
    
  3. Install the Moonshine package:
    • For PyTorch backend:
      uv pip install useful-moonshine@git+https://github.com/usefulsensors/moonshine.git
      export KERAS_BACKEND=torch
      
    • For TensorFlow backend:
      uv pip install useful-moonshine[tensorflow]@git+https://github.com/usefulsensors/moonshine.git
      export KERAS_BACKEND=tensorflow
      
    • For JAX backend:
      uv pip install useful-moonshine[jax]@git+https://github.com/usefulsensors/moonshine.git
      export KERAS_BACKEND=jax
      # Use useful-moonshine[jax-cuda] for JAX on GPU
      
  4. Test transcription:
    import moonshine
    moonshine.transcribe(moonshine.ASSETS_DIR / 'beckett.wav', 'moonshine/tiny')
    
    The first argument is the path to the audio file, and the second is the model name.

Cloud GPUs: Consider using services like AWS, Azure, or Google Cloud for GPU support if needed.

License

Moonshine is released under the MIT License, allowing for broad usage and modification.

More Related APIs in Automatic Speech Recognition