Home

Assistivox AI (0.1.0)

Assistivox™ AI is a voice-enabled document productivity suite by Bill Hollingsworth designed for researchers, programmers, technical writers, and other technical workers who need visual or motor accessibility solutions. Assistivox AI transforms documents into accessible content using local AI models, enabling technical careers that traditional accessibility tools have not adequately supported.

View on GitHub

Vision Problems Addressed

Document Accessibility Beyond Screen Readers

Traditional screen readers are limited to properly formatted digital text. Assistivox AI addresses the reality that much important content exists in visual formats—scanned documents, PDFs with complex layouts, images containing text, and poorly structured digital documents that defeat conventional accessibility tools.

Professional-Grade Text Processing

Assistivox AI provides the granular document navigation and editing capabilities needed for serious productivity work, going beyond basic screen readers to enable complex document creation, revision, and analysis through voice and AI assistance.

Local AI Processing

All AI models run entirely on your local machine. No user data ever leaves your computer or passes through cloud services. This ensures complete privacy while providing cutting-edge AI capabilities for vision assistance.

AI Technologies

Text-to-Speech Engines

Piper TTS: Efficient, high-quality speech synthesis optimized for older hardware
Kokoro TTS: Premium voice quality with advanced prosody for extended listening sessions

Document Vision Processing

Docling: Advanced PDF layout understanding with structure preservation
Tesseract OCR: Reliable text extraction for standard documents and images
DocTR OCR: High-accuracy text recognition optimized for complex document layouts

Speech Recognition

Vosk: Lightweight offline dictation for basic hardware requirements
Faster Whisper: High-accuracy speech-to-text with GPU acceleration support

System Requirements & Dependencies

Note: This is currently beta software requiring manual installation of several external dependencies to create the environment needed for this advanced AI suite. Future development will include an Assistivox Light edition with simplified installation requirements.

Required External Dependencies

System Audio (Required for speech features)

Linux: - Debian/Ubuntu: sudo apt-get install portaudio19-dev python3-pyaudio - Red Hat/Fedora/CentOS: sudo dnf install portaudio-devel python3-pyaudio - Arch Linux: sudo pacman -S portaudio python-pyaudio - openSUSE: sudo zypper install portaudio-devel python3-pyaudio

macOS: - With Homebrew: brew install portaudio - Without Homebrew: Install Homebrew first at https://brew.sh, then run the above command - MacPorts alternative: sudo port install portaudio

Windows: - Usually included automatically with PyAudio installation - If issues occur, install Microsoft Visual C++ Redistributable

Tesseract OCR Engine (Required for document OCR)

Linux: - Debian/Ubuntu: sudo apt-get install tesseract-ocr tesseract-ocr-eng - Red Hat/Fedora/CentOS: sudo dnf install tesseract tesseract-langpack-eng - Arch Linux: sudo pacman -S tesseract tesseract-data-eng - openSUSE: sudo zypper install tesseract-ocr tesseract-ocr-traineddata-english

macOS: - With Homebrew: brew install tesseract - With MacPorts: sudo port install tesseract

Windows: - Download installer from https://github.com/UB-Mannheim/tesseract/wiki - Important: Check "Add Tesseract to PATH" during installation - Or manually add installation directory to PATH environment variable

Docker (Required for Kokoro TTS)

Linux: - Debian/Ubuntu: Follow https://docs.docker.com/engine/install/ubuntu/ - Red Hat/Fedora/CentOS: Follow https://docs.docker.com/engine/install/fedora/ - Arch Linux: sudo pacman -S docker, then sudo systemctl enable --now docker - Add user to docker group: sudo usermod -aG docker $USER (logout/login required)

macOS: - Download Docker Desktop for Mac from https://www.docker.com/products/docker-desktop

Windows: - Download Docker Desktop for Windows from https://www.docker.com/products/docker-desktop - Requires: Windows 10/11 with WSL2 enabled

Git and CMake (Fallback only)

These are only needed if Piper TTS binary downloads fail and source compilation is required:

Linux: - Debian/Ubuntu: sudo apt-get install git build-essential cmake - Red Hat/Fedora/CentOS: sudo dnf install git gcc-c++ cmake make - Arch Linux: sudo pacman -S git base-devel cmake

macOS: - Xcode Command Line Tools: xcode-select --install - CMake via Homebrew: brew install cmake

Windows: - Git: Download from https://git-scm.com/download/win - Visual Studio Build Tools: Download from https://visualstudio.microsoft.com/downloads/ - CMake: Download from https://cmake.org/download/

CUDA Toolkit (Optional - GPU acceleration)

Download from https://developer.nvidia.com/cuda-downloads
Only needed if you have an NVIDIA GPU and want faster AI processing

Installation

Once you have installed the external dependencies for your platform:

Clone the repository: bash git clone https://github.com/cynsight/assistivox-ai.git cd assistivox-ai
Run the setup script: bash python setup-assistivox.py

Installation requires 10GB free storage and typically takes 5-20 minutes depending on internet speed. Setup automatically creates a desktop shortcut and application icon.

The setup script will create a virtual environment, install Python dependencies, download basic AI models, and configure the application for first use.

Beta Software Notice

Assistivox AI is currently in beta development. The current version requires manual installation of external dependencies to support the full range of AI capabilities. This installation process is designed for users who need immediate access to professional-grade vision assistance tools.

Future Development: A simplified "Assistivox Light" edition is planned that will reduce external dependencies and provide easier installation for broader accessibility, focusing on core text editing with basic TTS and dictation capabilities.

Privacy & Security

100% Local Processing: All AI models run on your local machine
No Cloud Dependencies: No user data transmitted to external services
No Internet Required: Core functionality works completely offline
Privacy by Design: Your documents and voice data never leave your control

Getting Started

After installation, explore the documentation sections: - Document Editor - Core text editing and accessibility features - Text-to-Speech Reader - Advanced document reading with navigation - Vision Processing - PDF and document AI extraction - Settings - Configure engines and voices for your needs