Text-to-Speech

Conjure includes text-to-speech (TTS) capabilities that let you hear text read aloud. This is useful for proofreading dictated text, reviewing documents, or accessibility.

Platform support

System TTS voices are currently available on Windows and Linux. macOS support is planned for a future release.

TTS Providers

System Voices

Your operating system's built-in speech synthesis. Free, works offline, and requires no configuration. Voice quality varies by platform:

Windows -- Microsoft speech voices (e.g., David, Zira, Mark)
Linux -- espeak or festival, depending on your distribution

OpenAI TTS

High-quality neural voices from OpenAI. Requires an OpenAI API key configured in Settings.

OpenAI TTS costs

OpenAI voices use API credits at approximately $15 per 1M characters. System voices and Edge TTS are free alternatives.

Cost: approximately $15 per 1 million characters
Voices: alloy, echo, fable, onyx, nova, shimmer
Quality: significantly more natural than system voices

Edge TTS

Microsoft Edge's online TTS service. Free, high quality, wide language support. Requires an internet connection but no API key.

Speak Selected Text

The primary way to use TTS:

Configure a Speak selected text hotkey in Settings > Input > Hotkey shortcuts
Select text in any application (highlight it with your mouse or keyboard)
Press the TTS hotkey
Conjure reads the selected text aloud

Under the hood, Conjure uses the accessibility API to read the text selection. If that fails (some apps don't expose selections via accessibility), it falls back to copying the selection to the clipboard via Ctrl+C, reading the clipboard, then restoring your original clipboard contents.

TIP

If TTS doesn't speak after pressing the hotkey, you'll see a toast notification explaining what went wrong -- usually that no text was selected or the accessibility API couldn't read the selection.

Auto-Speak

You can configure Conjure to automatically speak certain outputs:

Agent responses -- have the agent's replies read aloud automatically after each turn
Dictation results -- hear your processed dictation read back to verify accuracy

Configure auto-speak behavior in Settings > Processing > Text-to-Speech.

The TTS Pill Overlay

When TTS is speaking, the dictation pill overlay changes to indicate the state:

Indicator	Meaning
Purple pill with volume icon	TTS is actively speaking
"Speaking..." label	Audio is playing
Click the pill	Stops TTS playback immediately

The TTS state is synced from the main app window to the overlay, so the pill accurately reflects playback state even when the main window is not focused.

Configuring TTS

TTS settings are found in Settings > Processing > Text-to-Speech:

Provider -- choose between System, OpenAI, or Edge TTS
Voice -- select from available voices for the chosen provider
Speed -- adjust playback speed (slower for proofreading, faster for skimming)
Auto-speak -- toggle automatic speaking for agent responses

Cost Tracking

When using OpenAI TTS, character usage is tracked alongside your other API usage statistics. You can see TTS costs on the home page usage card and in per-key usage badges in Settings.

Estimated cost: ~$0.015 per 1,000 characters, or roughly $0.03-0.05 per page of text.

Hotkey Considerations

WARNING

Avoid using Alt-based hotkeys for TTS on Windows. Alt key combinations can trigger application menu bars, causing the hotkey to open a menu instead of speaking text. Prefer Ctrl-based or function key combinations.

Text-to-Speech ​

TTS Providers ​

System Voices ​

OpenAI TTS ​

Edge TTS ​

Speak Selected Text ​

Auto-Speak ​

The TTS Pill Overlay ​

Configuring TTS ​

Cost Tracking ​

Hotkey Considerations ​