fish agent v0.1 3b
fishaudioFISH AGENT V0.1 3B
Introduction
Fish Agent V0.1 3B is an advanced Voice-to-Voice model capable of capturing and generating environmental audio information with high accuracy. It features a semantic-token-free architecture, eliminating the need for traditional semantic encoders/decoders. The model is also a state-of-the-art text-to-speech (TTS) system, trained on a vast dataset of 700,000 hours of multilingual audio content. It is a continue-pretrained version of Qwen-2.5-3B-Instruct, optimized for 200 billion voice and text tokens.
Architecture
The architecture of Fish Agent V0.1 3B is distinguished by its semantic-token-free design, which enhances its ability to process and generate audio without relying on traditional encoders like Whisper and CosyVoice. This innovative approach allows for more efficient and accurate audio handling.
Training
The model has been trained using a comprehensive dataset comprising 700,000 hours of multilingual audio content. The supported languages and their respective training data sizes are:
- English (en): ~300,000 hours
- Chinese (zh): ~300,000 hours
- German (de): ~20,000 hours
- Japanese (ja): ~20,000 hours
- French (fr): ~20,000 hours
- Spanish (es): ~20,000 hours
- Korean (ko): ~20,000 hours
- Arabic (ar): ~20,000 hours
Guide: Running Locally
To run the Fish Agent V0.1 3B model locally, follow these basic steps:
- Clone the Fish Speech GitHub repository:
git clone https://github.com/fishaudio/fish-speech
. - Install the required dependencies as listed in the repository's documentation.
- Configure your environment according to the guidelines provided in the repository.
- Execute the model using the provided scripts and instructions.
For optimal performance, consider using cloud-based GPUs, such as those available from Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure.
License
This model and its associated code are released under the Creative Commons BY-NC-SA 4.0 license. It permits non-commercial use with proper attribution, ensuring users adhere to the terms and conditions specified.