hertz dev
si-pbcIntroduction
Hertz-dev is an open-source base model designed for full-duplex conversational audio processing. It features an 8.5 billion parameter transformer trained on 20 million hours of high-quality audio data, supporting both mono- and full-duplex generation. The model excels in tasks like live translation and classification due to its accurate modeling of human-like speech patterns, including pauses and emotional inflections.
Architecture
Hertz-dev leverages a transformer architecture with 8.5 billion parameters, trained extensively on a large dataset of conversational audio. It achieves state-of-the-art performance with an average real-world latency of 120ms on a single RTX 4090, significantly lower than previous models. This low latency is crucial for generating natural-sounding audio.
Training
The model has been trained on a vast dataset comprising 20 million unique hours of conversational audio. It serves as a base model without fine-tuning, Reinforcement Learning from Human Feedback (RLHF), or instruction-following behaviors. Users can fine-tune Hertz-dev for various audio modeling tasks.
Guide: Running Locally
-
Clone the Repository:
git clone https://github.com/Standard-Intelligence/hertz-dev cd hertz-dev
-
Set Up Environment:
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt
- For Ubuntu, install additional dependencies:
sudo apt-get install libportaudio2
- For Ubuntu, install additional dependencies:
-
Install PyTorch with CUDA Support:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
-
Run Inference Notebook:
Useinference.ipynb
to generate audio completions. Note that Windows setups may require adjustments due to flash attention dependencies. -
Live Interaction:
Useinference_client.py
andinference_server.py
for live interaction through a microphone. These are tested mainly on Ubuntu (server) and MacOS (client).
Cloud GPUs: For optimal performance, consider using cloud GPU services like AWS, Google Cloud, or Azure with NVIDIA RTX 4090 or equivalent.
License
Hertz-dev is licensed under the Apache-2.0 License, allowing for open use and modification under the terms specified.