Language-centric Assitive Computing

Assistivox AI Language-centric Design

Information management involves two fundamental tasks: reading and writing. Traditional computing interfaces require visual interaction (pointing, clicking, reading text on screens). This design can exclude many users who have difficulty seeing (text, small fonts, small icons) or difficulty pointing and clicking. A language-centric interface offers users a way to interact with an application regardless of visual ability or fine motor control. Assistivox AI makes spoken language a central part of its design.

Reading: From Any Source to Spoken Language

Traditional screen readers are limited to text that is already digitized and properly formatted. This limitation creates a reliance on new authors and content creators to make suitable accommodations for digital content. The Assistivox AI philosophy is that a reader with vision challenges should be able to access any information that any other reader can access, regardless of whether it has been appropriately formatted.

Reading Aloud With Text-to-Speech

Multiple TTS engines provide options for different needs: - Piper: Efficient processing for older hardware - Kokoro: Higher quality voices for better listening experiences

Vision-to-Speech

Many document (such as books and magazines) are not formatted as digital text. Assistivox AI offers document vision options: - OCR engines (Tesseract, DocTR) extract text from scanned documents and images - Document processing (Docling) handles complex PDF layouts, preserving structure

This creates a complete pipeline: visual content → text recognition → speech output. Users can "read" any supported document by having it spoken aloud, regardless of its original format.

Writing: From Spoken Language to Text

Multiple speech-to-text engines and models offer dictation options that meet different accuracy needs and hardware constraints:

Dictation Engines

Vosk: Lightweight processing for basic hardware
Faster Whisper: Higher accuracy for more capable systems

Future Work

This is currently beta software focused on reading and writing. Future work will include:

Organization

Document management, file organization, and information structuring through voice commands and audio feedback.

Complete interface navigation through spoken commands, making visual elements optional rather than required.