Piper engine settings

About Piper TTS

Piper TTS is an offline text-to-speech engine that generates realistic-sounding speech using AI-based techniques. The Piper TTS neural network architecture captures natural prosody (intonation and rhythm), as well as acoustics, more accurately than older concatenative or formant generation systems. Most Piper TTS voices are trained on standard speech datasets such as LJSpeech, which contains approximately 13,000 recorded English phrases—several hours of speech in total.

System Requirements

Piper is designed to produce real-time speech on modest hardware systems. For optimal performance, the following is recommended:

A quad-core CPU. A GPU is not required.
8GB of RAM.

Each voice model requires approximately 100MB of storage space, with standard voices ranging from 80MB to 120MB depending on quality settings.

Settings

The Piper TTS settings page displays a list of voices that you currently have installed. To use a voice:

Select a Voice: Click on any voice in the list to select it as the active Piper voice
Voice Preview: Clicking on a voice will read a simple test text in that voice, allowing you to hear how it sounds before making it your active selection
Active Voice Indicator: The currently selected voice will be highlighted in the interface

Voice Downloads

If there are voices that you have not installed, you will see a "Download Piper Voice Pack" button. This button will download additional voices to expand your selection. There are 19 voices in total available for download.