Home
Assistivox AI (0.1.0)
Assistivox™ AI is a voice-enabled document productivity suite by Bill Hollingsworth designed for researchers, programmers, technical writers, and other technical workers who need visual or motor accessibility solutions. Assistivox AI transforms documents into accessible content using local AI models, enabling technical careers that traditional accessibility tools have not adequately supported.
Vision Problems Addressed
Document Accessibility Beyond Screen Readers
Traditional screen readers are limited to properly formatted digital text. Assistivox AI addresses the reality that much important content exists in visual formats—scanned documents, PDFs with complex layouts, images containing text, and poorly structured digital documents that defeat conventional accessibility tools.
Professional-Grade Text Processing
Assistivox AI provides the granular document navigation and editing capabilities needed for serious productivity work, going beyond basic screen readers to enable complex document creation, revision, and analysis through voice and AI assistance.
Local AI Processing
All AI models run entirely on your local machine. No user data ever leaves your computer or passes through cloud services. This ensures complete privacy while providing cutting-edge AI capabilities for vision assistance.
AI Technologies
Text-to-Speech Engines
- Piper TTS: Efficient, high-quality speech synthesis optimized for older hardware
- Kokoro TTS: Premium voice quality with advanced prosody for extended listening sessions
Document Vision Processing
- Docling: Advanced PDF layout understanding with structure preservation
- Tesseract OCR: Reliable text extraction for standard documents and images
- DocTR OCR: High-accuracy text recognition optimized for complex document layouts
Speech Recognition
- Vosk: Lightweight offline dictation for basic hardware requirements
- Faster Whisper: High-accuracy speech-to-text with GPU acceleration support
System Requirements & Dependencies
Note: This is currently beta software requiring manual installation of several external dependencies to create the environment needed for this advanced AI suite. Future development will include an Assistivox Light edition with simplified installation requirements.
Required External Dependencies
System Audio (Required for speech features)
Linux:
- Debian/Ubuntu: sudo apt-get install portaudio19-dev python3-pyaudio
- Red Hat/Fedora/CentOS: sudo dnf install portaudio-devel python3-pyaudio
- Arch Linux: sudo pacman -S portaudio python-pyaudio
- openSUSE: sudo zypper install portaudio-devel python3-pyaudio
macOS:
- With Homebrew: brew install portaudio
- Without Homebrew: Install Homebrew first at https://brew.sh, then run the above command
- MacPorts alternative: sudo port install portaudio
Windows: - Usually included automatically with PyAudio installation - If issues occur, install Microsoft Visual C++ Redistributable
Tesseract OCR Engine (Required for document OCR)
Linux:
- Debian/Ubuntu: sudo apt-get install tesseract-ocr tesseract-ocr-eng
- Red Hat/Fedora/CentOS: sudo dnf install tesseract tesseract-langpack-eng
- Arch Linux: sudo pacman -S tesseract tesseract-data-eng
- openSUSE: sudo zypper install tesseract-ocr tesseract-ocr-traineddata-english
macOS:
- With Homebrew: brew install tesseract
- With MacPorts: sudo port install tesseract
Windows: - Download installer from https://github.com/UB-Mannheim/tesseract/wiki - Important: Check "Add Tesseract to PATH" during installation - Or manually add installation directory to PATH environment variable
Docker (Required for Kokoro TTS)
Linux:
- Debian/Ubuntu: Follow https://docs.docker.com/engine/install/ubuntu/
- Red Hat/Fedora/CentOS: Follow https://docs.docker.com/engine/install/fedora/
- Arch Linux: sudo pacman -S docker, then sudo systemctl enable --now docker
- Add user to docker group: sudo usermod -aG docker $USER (logout/login required)
macOS: - Download Docker Desktop for Mac from https://www.docker.com/products/docker-desktop
Windows: - Download Docker Desktop for Windows from https://www.docker.com/products/docker-desktop - Requires: Windows 10/11 with WSL2 enabled
Git and CMake (Fallback only)
These are only needed if Piper TTS binary downloads fail and source compilation is required:
Linux:
- Debian/Ubuntu: sudo apt-get install git build-essential cmake
- Red Hat/Fedora/CentOS: sudo dnf install git gcc-c++ cmake make
- Arch Linux: sudo pacman -S git base-devel cmake
macOS:
- Xcode Command Line Tools: xcode-select --install
- CMake via Homebrew: brew install cmake
Windows: - Git: Download from https://git-scm.com/download/win - Visual Studio Build Tools: Download from https://visualstudio.microsoft.com/downloads/ - CMake: Download from https://cmake.org/download/
CUDA Toolkit (Optional - GPU acceleration)
- Download from https://developer.nvidia.com/cuda-downloads
- Only needed if you have an NVIDIA GPU and want faster AI processing
Installation
Once you have installed the external dependencies for your platform:
-
Clone the repository:
bash git clone https://github.com/cynsight/assistivox-ai.git cd assistivox-ai -
Run the setup script:
bash python setup-assistivox.py
Installation requires 10GB free storage and typically takes 5-20 minutes depending on internet speed. Setup automatically creates a desktop shortcut and application icon.
The setup script will create a virtual environment, install Python dependencies, download basic AI models, and configure the application for first use.
Beta Software Notice
Assistivox AI is currently in beta development. The current version requires manual installation of external dependencies to support the full range of AI capabilities. This installation process is designed for users who need immediate access to professional-grade vision assistance tools.
Future Development: A simplified "Assistivox Light" edition is planned that will reduce external dependencies and provide easier installation for broader accessibility, focusing on core text editing with basic TTS and dictation capabilities.
Privacy & Security
- 100% Local Processing: All AI models run on your local machine
- No Cloud Dependencies: No user data transmitted to external services
- No Internet Required: Core functionality works completely offline
- Privacy by Design: Your documents and voice data never leave your control
Getting Started
After installation, explore the documentation sections: - Document Editor - Core text editing and accessibility features - Text-to-Speech Reader - Advanced document reading with navigation - Vision Processing - PDF and document AI extraction - Settings - Configure engines and voices for your needs