Skip to content

Agent Mode

Agent mode turns Conjure into a voice-driven AI assistant that can interact with your computer. Instead of just transcribing speech to text, it interprets your request and uses tools to accomplish tasks.

What Agent Mode Does

In agent mode, you speak a request and the AI processes it as a conversation rather than a dictation. The agent can:

  • Paste text into your active application
  • Run terminal commands and return the output
  • Take screenshots and analyze them (with vision-capable models)
  • Read and write files on your system
  • Send keyboard shortcuts to control applications
  • Query accessibility information from the focused UI element

The agent maintains a conversation history that persists across app restarts, so you can refer back to earlier context.

Requires API key

Agent mode requires a cloud AI provider (OpenAI, Claude, Groq, etc.) with chat/completion capabilities. Local whisper alone is not sufficient -- it only handles speech-to-text, not the AI reasoning that agent mode needs.

Setting Up Agent Mode

1. Configure an LLM Provider

Agent mode requires an LLM provider with chat/completion capabilities. Go to Settings > API Keys and add a key for one of the supported providers:

  • OpenAI (GPT-4o recommended for screenshot analysis)
  • Claude (Anthropic, excellent for complex reasoning)
  • Groq (fast, free tier available)
  • DeepSeek (very affordable)
  • Ollama (free, runs locally)
  • Any OpenAI-compatible endpoint

2. Enable Agent Mode

Agent mode is available from the dictation controls. You can:

  • Configure a dedicated agent mode dictation hotkey in Settings > Input > Hotkey shortcuts
  • Switch to agent mode from the mode selector

3. Start a Conversation

Hold the agent dictation hotkey and speak your request. For example:

  • "Paste the current date into this text field"
  • "Run git status in the terminal"
  • "Take a screenshot and tell me what you see"
  • "Read the file at C:\Users\me\notes.txt"
  • "Type Ctrl+S to save this file"

Power mode tools

Terminal commands, file read/write, and keyboard shortcuts run with your full user permissions. These tools can modify files, execute arbitrary commands, and send keystrokes to any application. Review tool invocations carefully.

Available Tools

Paste

Pastes text into the currently focused application. The agent uses this to write text, code, or any content into your active text field.

Run Terminal Command

Executes a shell command and returns the output. The agent can run any command your user account has permission for -- git status, ls, npm install, python script.py, etc.

Take Screenshot

Captures the current screen as a PNG image. When using a vision-capable model (GPT-4o, Claude with vision), the agent can analyze the screenshot to understand what's on your screen. This enables requests like "what error am I looking at?" or "describe this UI."

Read File

Reads the contents of a file from disk. Useful for having the agent analyze configuration files, logs, or code.

Write File

Writes content to a file on disk. The agent can create new files or overwrite existing ones.

Send Keys

Sends keyboard shortcuts to the active application. For example, Ctrl+S to save, Ctrl+Z to undo, or Alt+Tab to switch windows.

Get Accessibility Info

Queries the operating system's accessibility API to get information about the currently focused UI element -- its name, role, value, and state. This helps the agent understand context without taking a screenshot.

End Conversation

Signals that the conversation is complete. The agent calls this when it has finished fulfilling your request.

Conversation Persistence

Agent conversations are stored in a local SQLite database and survive app restarts. When you reopen Conjure, your previous conversation context is restored. You can:

  • Continue where you left off
  • Start a new conversation from the chat interface
  • Browse conversation history

Skip Tool Permissions

By default, the agent asks for your approval before executing potentially dangerous tools (terminal commands, file writes, key sends). Each tool invocation shows a permission prompt with the tool name and parameters.

For power users, you can enable "Dangerously skip permissions" in agent mode settings. This bypasses all approval prompts and lets the agent execute tools immediately.

INFO

The skip permissions toggle is stored in localStorage (conjure:skip-tool-permissions) and persists across restarts. Disable it when you're done with unattended workflows.

WARNING

Skipping tool permissions means the agent can run arbitrary commands, write files, and send keystrokes without asking. Only enable this if you trust your LLM provider and understand the risks.

Supported Providers

Agent mode's streamChat() function is implemented for 9 providers:

ProviderImplementationNotes
OpenAIopenaiCompatibleStreamChatFull tool use, vision with GPT-4o
GroqopenaiCompatibleStreamChatFast, free tier
OllamaopenaiCompatibleStreamChatLocal, free
OpenAI-CompatibleopenaiCompatibleStreamChatAny compatible endpoint
OpenRouteropenaiCompatibleStreamChatMulti-model access
Azure OpenAIopenaiCompatibleStreamChatEnterprise Azure deployments
DeepSeekopenaiCompatibleStreamChatVery affordable
GeminiopenaiCompatibleStreamChatGoogle's models
ClaudeclaudeStreamChatAnthropic's models

Tips

  • Be specific in your requests. "Paste the word hello" is better than "write something."
  • Use screenshots with vision-capable models to give the agent context about what you're looking at.
  • Start simple -- try paste and terminal commands before moving to file operations.
  • Check the conversation panel to see the agent's reasoning and tool calls.
  • End conversations when you're done with a task to keep context clean for the next request.

Released under the AGPLv3 License.