mini omni
gpt-omniIntroduction
Mini-Omni is an open-source multimodal large language model designed by Hugging Face. It has the capability to process real-time speech inputs and produce streaming audio outputs, facilitating seamless conversations without the need for additional Automatic Speech Recognition (ASR) or Text-to-Speech (TTS) models. The model supports features such as talking while thinking, allowing for simultaneous text and audio generation, and includes both "Audio-to-Text" and "Audio-to-Audio" batch inference for enhanced performance.
Architecture
Mini-Omni is built on the Qwen2-0.5B base model and integrates several technologies for its operations:
- Qwen2 serves as the language model backbone.
- litGPT is used for both training and inference processes.
- Whisper handles audio encoding.
- Snac is responsible for audio decoding.
- CosyVoice generates synthetic speech.
- OpenOrca and MOSS are utilized for model alignment.
Training
The model leverages the litGPT framework for its training and inference capabilities, which is designed to efficiently handle large language models. Detailed training methodologies and configurations are available in the Mini-Omni GitHub repository.
Guide: Running Locally
-
Set Up Environment
- Create a new conda environment and activate it:
conda create -n omni python=3.10 conda activate omni
- Create a new conda environment and activate it:
-
Clone Repository and Install Dependencies
- Clone the Mini-Omni repository and install required packages:
git clone https://github.com/gpt-omni/mini-omni.git cd mini-omni pip install -r requirements.txt
- Clone the Mini-Omni repository and install required packages:
-
Start Server
- Launch the server to host the interactive demo:
python3 server.py --ip '0.0.0.0' --port 60808
- Launch the server to host the interactive demo:
-
Run Demos
- Streamlit Demo: Requires
PyAudio
for local execution.pip install PyAudio==0.2.14 API_URL=http://0.0.0.0:60808/chat streamlit run webui/omni_streamlit.py
- Gradio Demo:
API_URL=http://0.0.0.0:60808/chat python3 webui/omni_gradio.py
- Streamlit Demo: Requires
-
Local Testing
- Test run with preset audio samples:
python inference.py
- Test run with preset audio samples:
For enhanced performance, consider utilizing cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
Mini-Omni is released under the MIT License, allowing for extensive use, modification, and distribution.