Kokoro engine settings
About Kokoro TTS
Kokoro TTS is a neural text-to-speech engine built with StyleTTS 2, which uses transformer-based text processing. The audio is produced with iSTFTNet, a type of vocoder that turns predicted speech features into natural-sounding sound. Kokoro runs locally without an Internet connection.
System Requirements
The following is recommended:
- A dual-core CPU
- 8 GB of RAM
- 500 MB or storage
- CUDA-compatible GPU for optional GPU acceleration
Settings
Voice Groups
The Kokoro TTS settings page displays voices organized into 4 voice groups with 41 voices total. All voices come pre-installed with Assistivox AI and do not require downloading.
Voice Group Interface
- Expanding Groups: Click on a voice group to expand and show the voices within that group
- Voice Selection: Click on any voice to select it as the active speaking voice
- Voice Preview: Clicking on a voice plays a test message in that voice for preview
Docker Configuration
Assistivox AI uses Docker for running the Kokoro TTS engine. You may configure the Docker port for Kokoro TTS (default: 8880).
GPU Settings
When a CUDA-compatible GPU is detected, you can enable or disable GPU acceleration.
Note: GPU acceleration requires a CUDA-compatible NVIDIA graphics card. The system will automatically detect compatible hardware and enable the GPU option when available.
Definitions
- Transformer-based text processing: An AI method that helps understand the structure and context of text.
- Decoder-only: The model directly generates audio features from text, simplifying and speeding up the process.
- Vocoder: A component that turns intermediate audio features into actual sound waves—in this case, optimized for quality and efficiency.)