F5 T T S Faster LLM Model

Introduction

F5-TTS-Faster is an accelerated inference project for the F5-TTS model. It uses ONNX and TensorRT-LLM for optimized performance, achieving significant speed improvements. This project includes model weights in various formats such as torch, ONNX, and trtllm.

Architecture

The project accelerates the F5-TTS model inference by exporting it into three parts using ONNX. The Transformer network section is modified using TensorRT-LLM for acceleration, while the frontend and decode parts continue using ONNX inference. Various execution providers, like CUDAExecutionProvider and OpenVINOExecutionProvider, can be specified for further optimization.

Training

This project does not focus on training but rather on inference acceleration. It provides a workflow for adapting the F5-TTS model to run faster on specified hardware setups, such as the NVIDIA GeForce RTX 3090, where inference speed improved from 3.2 seconds to 0.72 seconds.

Guide: Running Locally

To run the F5-TTS-Faster project locally, follow these steps:

Export the F5-TTS model using ONNX, ensuring that it is divided into three parts as specified for optimized inference.
Modify the Transformer network using TensorRT-LLM to enhance speed while keeping the frontend and decode parts on ONNX.
Choose an execution provider such as CUDAExecutionProvider or OpenVINOExecutionProvider for the ONNX runtime.
Test the setup on a suitable GPU, like the NVIDIA GeForce RTX 3090, to verify performance improvements.

For those with limited local hardware capabilities, consider using cloud services that provide access to high-performance GPUs.

License

F5-TTS-Faster is released under the MIT License, allowing for wide usage and modification.

More Related APIs