F5 T T S Russian

hotstone228

F5-TTS-Russian Model

Introduction

The F5-TTS-Russian model is a finetuned Text-to-Speech (TTS) model specifically tailored for the Russian language. It is based on the SWivid/F5-TTS model and has undergone extensive training to optimize its performance for Russian datasets.

Architecture

  • Base Model: SWivid/F5-TTS
  • Language Support: Russian

Training

The model was trained for a total of 250,000 steps using the following configuration:

  • Learning Rate: 1e-05
  • Batch Size per GPU: 4500 (frame-based)
  • Maximum Samples: 64
  • Gradient Accumulation Steps: 1
  • Max Gradient Norm: 1
  • Epochs: 144
  • Warmup Updates: 5838
  • Updates to Save: 11676
  • Steps for Last Checkpoint: 2918
  • Finetuned: True
  • Tokenizer: Character-based
  • Mixed Precision: FP16
  • Optimizer: bnb_optimizer
  • Logger: wandb

The model was trained using datasets such as mozilla-foundation/common_voice_17_0 and others.

Guide: Running Locally

To run the F5-TTS-Russian model locally, consider the following steps:

  1. Clone the Repository: Visit the base repository on GitHub here and clone it to your local machine.

  2. Set Up Environment: Install the necessary dependencies specified in the repository's README.md or requirements.txt.

  3. Download Model Weights: Ensure that you have access to the finetuned model weights. These should be available in the repository or through associated links.

  4. Configure Settings: Adjust the configuration settings if needed, based on your hardware capabilities and specific use cases.

  5. Run the Model: Execute the script to generate TTS outputs from Russian text input.

For better performance, especially with larger datasets, it is recommended to use cloud-based GPU services such as AWS, Google Cloud, or Azure.

License

The F5-TTS-Russian model is released under the Creative Commons Attribution Non Commercial Share Alike 4.0 license. This allows users to use, modify, and distribute the model freely, provided it is not for commercial purposes and proper attribution is given.

More Related APIs