Introduction

The hoyoTTS Model is an AI speech synthesis model designed to simulate the voices of characters in the games Genshin and Star Rail. It utilizes advanced deep learning, combining BERT's language understanding with VITS's speech synthesis technology, to produce highly natural and smooth speech outputs. The model generates voices that match character voice characteristics, enhancing game immersion. Version 1 (V1) has been optimized for voice naturalness, emotional expression, and synthesis accuracy, providing a realistic and enjoyable auditory experience for players.

Architecture

hoyoTTS integrates BERT for language comprehension and VITS for voice synthesis. This combination allows for the creation of character voices that are both natural and expressive, closely mimicking the original game characters' vocal traits.

Training

The model has undergone optimizations in its first version to enhance the naturalness of its voice outputs, improve emotional expression, and increase the accuracy of its voice synthesis. These improvements ensure that the synthesized voices closely resemble the original character voices from the games.

Guide: Running Locally

  1. Clone the Repository:

    git clone git@hf.co:Genius-Society/hoyoTTS
    cd hoyoTTS
    
  2. Download the Model:
    Use the modelscope library to download the model:

    from modelscope import snapshot_download
    model_dir = snapshot_download('Genius-Society/hoyoTTS')
    
  3. Cloud GPUs:
    To handle the computational requirements efficiently, it is recommended to use cloud GPU services like AWS, Google Cloud, or Azure.

License

This model is distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (cc-by-nc-nd-4.0).

More Related APIs