Faster Whisper engine settings
About Faster Whisper
Whisper is an automatic speech recognition system from OpenAI that transcribes spoken language into text using a transformer-based encoder-decoder architecture (in contrast to Vosk which uses a decoder-only architecture). It processes audio by converting it into a log-Mel spectrogram, which is passed through neural layers to extract features and generate text output. The Whisper models are trained on large and diverse data sets to produce dictation for different languages and accents, and in noisy audio conditions.
Faster Whisper is a reimplementation of Whisper that uses the CTranslate2 inference engine to delivering up to 4 times faster performance with significantly reduced memory usage. Both versions produce identical transcription results, but Faster Whisper is better suited for real-time dictation and resource-constrained environments.
Model Sizes for Dictation
Tiny Model
The most lightweight option for instant dictation with minimal system requirements.
Parameters: 39 million
Storage: 155 MB
Training: Basic multilingual dataset
System Requirements for Real-Time Dictation: - RAM: Less than 1 GB during operation - CPU: Any modern dual-core processor - GPU: Optional - any GPU with minimal VRAM
The Tiny model delivers the fastest dictation response with virtually no delay, making it ideal for quick note-taking and simple speech input. Best suited for clear speech in quiet environments, though accuracy may be reduced with background noise or varied speaking styles.
Base Model
A balanced option offering improved accuracy while maintaining fast dictation speeds.
Parameters: 74 million
Storage: 280 MB
Training: Enhanced multilingual dataset with more speech variations
System Requirements for Real-Time Dictation: - RAM: 1-2 GB during operation - CPU: Dual-core processor or better - GPU: Optional - basic consumer GPU
The Base model provides noticeably better accuracy than Tiny while remaining fast enough for smooth real-time dictation. Handles casual conversation and varied speaking paces with fewer transcription errors.
Small Model
The recommended choice for most dictation tasks, offering strong accuracy with reasonable resource requirements.
Parameters: 244 million
Storage: 900 MB
Training: Larger, more diverse speech dataset
System Requirements for Real-Time Dictation: - RAM: 2-4 GB during operation - CPU: Quad-core processor recommended - GPU: Optional - modern consumer GPU with 2+ GB VRAM
The Small model delivers significantly higher transcription precision compared to Base, making it suitable for professional writing, meetings, and dictation in moderately noisy environments. Provides excellent balance between accuracy and speed for daily use.
Medium Model
Advanced accuracy for challenging dictation scenarios with higher hardware requirements.
Parameters: 769 million
Storage: 3 GB
Training: Much larger and more varied speech dataset
System Requirements for Real-Time Dictation: - RAM: 2-6 GB during operation - CPU: Powerful multi-core processor - GPU: CUDA-capable GPU with 6+ GB VRAM recommended
The Medium model excels in noisy environments, with accented speech, and during rapid dictation. Trained on broader speech data for improved handling of complex audio conditions while maintaining practical real-time performance on capable hardware.
Large Model (Large-v3)
Maximum accuracy for professional-grade dictation requiring the highest precision.
Parameters: 1.55 billion
Storage: 6 GB
Training: The largest, most comprehensive speech dataset
System Requirements for Real-Time Dictation: - RAM: 3-10 GB during operation - CPU: High-end multi-core processor - GPU: Modern GPU with 8-10 GB VRAM strongly recommended
The Large model provides the best possible transcription accuracy, especially for challenging conditions like heavy accents, technical terminology, or poor audio quality. Requires powerful hardware to maintain real-time dictation speeds, but delivers professional-grade results for critical transcription needs.
Settings
The Faster Whisper dictation settings page provides the following options:
- Available Models: All five model sizes (Tiny, Base, Small, Medium, Large) are listed with selection options
- Model Selection: Click on any installed model to select it as the active dictation engine
- Download Option: If a model is not installed, a "Download" button appears next to it for easy installation
- GPU Acceleration: Toggle GPU usage on or off when a CUDA-compatible GPU is detected
- Auto Sentence Format: Enable automatic capitalization and punctuation formatting for dictated text
Note: GPU acceleration requires a CUDA-compatible NVIDIA graphics card. The system will automatically detect compatible hardware and enable the GPU option when available.