Language-centric Assitive Computing
Assistivox AI Language-centric Design
Information management involves two fundamental tasks: reading and writing. Traditional computing interfaces require visual interaction (pointing, clicking, reading text on screens). This design can exclude many users who have difficulty seeing (text, small fonts, small icons) or difficulty pointing and clicking. A language-centric interface offers users a way to interact with an application regardless of visual ability or fine motor control. Assistivox AI makes spoken language a central part of its design.
Reading: From Any Source to Spoken Language
Traditional screen readers are limited to text that is already digitized and properly formatted. This limitation creates a reliance on new authors and content creators to make suitable accommodations for digital content. The Assistivox AI philosophy is that a reader with vision challenges should be able to access any information that any other reader can access, regardless of whether it has been appropriately formatted.
Reading Aloud With Text-to-Speech
Multiple TTS engines provide options for different needs: - Piper: Efficient processing for older hardware - Kokoro: Higher quality voices for better listening experiences
Vision-to-Speech
Many document (such as books and magazines) are not formatted as digital text. Assistivox AI offers document vision options: - OCR engines (Tesseract, DocTR) extract text from scanned documents and images - Document processing (Docling) handles complex PDF layouts, preserving structure
This creates a complete pipeline: visual content → text recognition → speech output. Users can "read" any supported document by having it spoken aloud, regardless of its original format.
Writing: From Spoken Language to Text
Multiple speech-to-text engines and models offer dictation options that meet different accuracy needs and hardware constraints:
Dictation Engines
- Vosk: Lightweight processing for basic hardware
- Faster Whisper: Higher accuracy for more capable systems
Future Work
This is currently beta software focused on reading and writing. Future work will include:
Organization
Document management, file organization, and information structuring through voice commands and audio feedback.
Full Voice Navigation
Complete interface navigation through spoken commands, making visual elements optional rather than required.