Skip to content

Text-to-Speech

Conjure includes text-to-speech (TTS) capabilities that let you hear text read aloud. This is useful for proofreading dictated text, reviewing documents, or accessibility.

Platform support

System TTS voices are currently available on Windows and Linux. macOS support is planned for a future release.

TTS Providers

System Voices

Your operating system's built-in speech synthesis. Free, works offline, and requires no configuration. Voice quality varies by platform:

  • Windows -- Microsoft speech voices (e.g., David, Zira, Mark)
  • Linux -- espeak or festival, depending on your distribution

OpenAI TTS

High-quality neural voices from OpenAI. Requires an OpenAI API key configured in Settings.

OpenAI TTS costs

OpenAI voices use API credits at approximately $15 per 1M characters. System voices and Edge TTS are free alternatives.

  • Cost: approximately $15 per 1 million characters
  • Voices: alloy, echo, fable, onyx, nova, shimmer
  • Quality: significantly more natural than system voices

Edge TTS

Microsoft Edge's online TTS service. Free, high quality, wide language support. Requires an internet connection but no API key.

Speak Selected Text

The primary way to use TTS:

  1. Configure a Speak selected text hotkey in Settings > Input > Hotkey shortcuts
  2. Select text in any application (highlight it with your mouse or keyboard)
  3. Press the TTS hotkey
  4. Conjure reads the selected text aloud

Under the hood, Conjure uses the accessibility API to read the text selection. If that fails (some apps don't expose selections via accessibility), it falls back to copying the selection to the clipboard via Ctrl+C, reading the clipboard, then restoring your original clipboard contents.

TIP

If TTS doesn't speak after pressing the hotkey, you'll see a toast notification explaining what went wrong -- usually that no text was selected or the accessibility API couldn't read the selection.

Auto-Speak

You can configure Conjure to automatically speak certain outputs:

  • Agent responses -- have the agent's replies read aloud automatically after each turn
  • Dictation results -- hear your processed dictation read back to verify accuracy

Configure auto-speak behavior in Settings > Processing > Text-to-Speech.

The TTS Pill Overlay

When TTS is speaking, the dictation pill overlay changes to indicate the state:

IndicatorMeaning
Purple pill with volume iconTTS is actively speaking
"Speaking..." labelAudio is playing
Click the pillStops TTS playback immediately

The TTS state is synced from the main app window to the overlay, so the pill accurately reflects playback state even when the main window is not focused.

Configuring TTS

TTS settings are found in Settings > Processing > Text-to-Speech:

  • Provider -- choose between System, OpenAI, or Edge TTS
  • Voice -- select from available voices for the chosen provider
  • Speed -- adjust playback speed (slower for proofreading, faster for skimming)
  • Auto-speak -- toggle automatic speaking for agent responses

Cost Tracking

When using OpenAI TTS, character usage is tracked alongside your other API usage statistics. You can see TTS costs on the home page usage card and in per-key usage badges in Settings.

Estimated cost: ~$0.015 per 1,000 characters, or roughly $0.03-0.05 per page of text.

Hotkey Considerations

WARNING

Avoid using Alt-based hotkeys for TTS on Windows. Alt key combinations can trigger application menu bars, causing the hotkey to open a menu instead of speaking text. Prefer Ctrl-based or function key combinations.

Released under the AGPLv3 License.