Text-to-Speech
Conjure includes text-to-speech (TTS) capabilities that let you hear text read aloud. This is useful for proofreading dictated text, reviewing documents, or accessibility.
Platform support
System TTS voices are currently available on Windows and Linux. macOS support is planned for a future release.
TTS Providers
System Voices
Your operating system's built-in speech synthesis. Free, works offline, and requires no configuration. Voice quality varies by platform:
- Windows -- Microsoft speech voices (e.g., David, Zira, Mark)
- Linux -- espeak or festival, depending on your distribution
OpenAI TTS
High-quality neural voices from OpenAI. Requires an OpenAI API key configured in Settings.
OpenAI TTS costs
OpenAI voices use API credits at approximately $15 per 1M characters. System voices and Edge TTS are free alternatives.
- Cost: approximately $15 per 1 million characters
- Voices: alloy, echo, fable, onyx, nova, shimmer
- Quality: significantly more natural than system voices
Edge TTS
Microsoft Edge's online TTS service. Free, high quality, wide language support. Requires an internet connection but no API key.
Speak Selected Text
The primary way to use TTS:
- Configure a Speak selected text hotkey in Settings > Input > Hotkey shortcuts
- Select text in any application (highlight it with your mouse or keyboard)
- Press the TTS hotkey
- Conjure reads the selected text aloud
Under the hood, Conjure uses the accessibility API to read the text selection. If that fails (some apps don't expose selections via accessibility), it falls back to copying the selection to the clipboard via Ctrl+C, reading the clipboard, then restoring your original clipboard contents.
TIP
If TTS doesn't speak after pressing the hotkey, you'll see a toast notification explaining what went wrong -- usually that no text was selected or the accessibility API couldn't read the selection.
Auto-Speak
You can configure Conjure to automatically speak certain outputs:
- Agent responses -- have the agent's replies read aloud automatically after each turn
- Dictation results -- hear your processed dictation read back to verify accuracy
Configure auto-speak behavior in Settings > Processing > Text-to-Speech.
The TTS Pill Overlay
When TTS is speaking, the dictation pill overlay changes to indicate the state:
| Indicator | Meaning |
|---|---|
| Purple pill with volume icon | TTS is actively speaking |
| "Speaking..." label | Audio is playing |
| Click the pill | Stops TTS playback immediately |
The TTS state is synced from the main app window to the overlay, so the pill accurately reflects playback state even when the main window is not focused.
Configuring TTS
TTS settings are found in Settings > Processing > Text-to-Speech:
- Provider -- choose between System, OpenAI, or Edge TTS
- Voice -- select from available voices for the chosen provider
- Speed -- adjust playback speed (slower for proofreading, faster for skimming)
- Auto-speak -- toggle automatic speaking for agent responses
Cost Tracking
When using OpenAI TTS, character usage is tracked alongside your other API usage statistics. You can see TTS costs on the home page usage card and in per-key usage badges in Settings.
Estimated cost: ~$0.015 per 1,000 characters, or roughly $0.03-0.05 per page of text.
Hotkey Considerations
WARNING
Avoid using Alt-based hotkeys for TTS on Windows. Alt key combinations can trigger application menu bars, causing the hotkey to open a menu instead of speaking text. Prefer Ctrl-based or function key combinations.
