Vosk engine settings

About Vosk

Vosk is an offline speech recognition engine that converts spoken words into text locally on your device without requiring internet connection. It processes audio streams in real-time, providing immediate speech-to-text conversion with zero latency—words appear on screen as you speak them. Vosk works efficiently on modest hardware and supports continuous dictation for extended periods.

Small Model (0.15)

This a the most compact Vosk model and comes pre-installed with Assistivox AI

The Small Model is trained on general conversational speech and provides reliable accuracy for everyday dictation tasks. It requires minimal system resources and delivers real-time transcription suitable for personal writing, note-taking, and document creation.

System Requirements for Real-Time Dictation: - RAM: 300i MB during operation - CPU: Any dual-core processor - Storage: 50-60 MB

Lgraph Model (0.22)

This is a more advanced model that uses optimized graph compilation for improved accuracy and vocabulary recognition compared to the Small Model.

The Lgraph Model is trained on significantly more speech data than the Small Model, including diverse speaking styles and expanded vocabulary. It provides higher accuracy for complex sentences, technical terms, and varied speech patterns, making it suitable for professional writing and detailed documentation work.

System Requirements for Real-Time Dictation: - RAM: 1-2 GB during operation
- CPU: Quad-core processor recommended - Storage: 150-200 MB

The Lgraph Model requires more processing power but delivers noticeably better transcription accuracy, especially for longer dictation sessions and technical vocabulary.

Settings

The Vosk dictation settings page provides the following options:

Available Models: Both Small and Lgraph models are listed with selection checkboxes
Download Option: If the Lgraph model is not installed, a "Download" button appears next to it
Model Activation: Click on any installed model to select it as the active dictation engine
Show Partial Text: Toggle whether to display gray partial text while speaking during dictation