F5 Spanish
jpgallegoarIntroduction
The F5-TTS model is a text-to-speech system finetuned for Spanish language speech synthesis. It aims to provide high-quality, regionally diverse speech capabilities for Spanish speakers.
Architecture
The base model used for F5-TTS is SWivid/F5-TTS
. The training involved 218 hours of audio data, configured with a batch size of 3200, a maximum of 64 samples, and 1,200,000 training steps.
Training
The F5-TTS model was trained on various datasets, including the Voxpopuli Dataset and crowdsourced high-quality Spanish speech data from different regions such as Argentina, Chile, Colombia, Peru, Puerto Rico, and Venezuela. The TEDx Spanish Corpus was also utilized.
Guide: Running Locally
Method 1: Manual Model Replacement
- Run the Application: Launch the F5-TTS application and check the terminal for the model file path.
- Replace the Model File:
- Navigate to the file location.
- Rename the existing model file to
model_1200000.safetensors.bak
. - Download and save
model_1200000.safetensors
from the repository to the same location.
- Restart the Application: Relaunch to load the updated model.
Alternative Methods
- GitHub Repository: Clone the Spanish-F5 repository and follow installation instructions.
- Google Colab: Use the model in Google Colab.
- Change runtime type to T4 GPU and run all cells.
- Access the public URL provided.
- Jupyter Notebook: Run using the
Spanish_F5.ipynb
notebook.
Cloud GPUs
For efficient execution, consider using cloud GPUs like those available on Google Colab or AWS.
License
The F5-TTS model is released under the CC0-1.0 license, allowing for free use, modification, and distribution.