Agent Mode
Agent mode turns Conjure into a voice-driven AI assistant that can interact with your computer. Instead of just transcribing speech to text, it interprets your request and uses tools to accomplish tasks.
What Agent Mode Does
In agent mode, you speak a request and the AI processes it as a conversation rather than a dictation. The agent can:
- Paste text into your active application
- Run terminal commands and return the output
- Take screenshots and analyze them (with vision-capable models)
- Read and write files on your system
- Send keyboard shortcuts to control applications
- Query accessibility information from the focused UI element
The agent maintains a conversation history that persists across app restarts, so you can refer back to earlier context.
Requires API key
Agent mode requires a cloud AI provider (OpenAI, Claude, Groq, etc.) with chat/completion capabilities. Local whisper alone is not sufficient -- it only handles speech-to-text, not the AI reasoning that agent mode needs.
Setting Up Agent Mode
1. Configure an LLM Provider
Agent mode requires an LLM provider with chat/completion capabilities. Go to Settings > API Keys and add a key for one of the supported providers:
- OpenAI (GPT-4o recommended for screenshot analysis)
- Claude (Anthropic, excellent for complex reasoning)
- Groq (fast, free tier available)
- DeepSeek (very affordable)
- Ollama (free, runs locally)
- Any OpenAI-compatible endpoint
2. Enable Agent Mode
Agent mode is available from the dictation controls. You can:
- Configure a dedicated agent mode dictation hotkey in Settings > Input > Hotkey shortcuts
- Switch to agent mode from the mode selector
3. Start a Conversation
Hold the agent dictation hotkey and speak your request. For example:
- "Paste the current date into this text field"
- "Run git status in the terminal"
- "Take a screenshot and tell me what you see"
- "Read the file at C:\Users\me\notes.txt"
- "Type Ctrl+S to save this file"
Power mode tools
Terminal commands, file read/write, and keyboard shortcuts run with your full user permissions. These tools can modify files, execute arbitrary commands, and send keystrokes to any application. Review tool invocations carefully.
Available Tools
Paste
Pastes text into the currently focused application. The agent uses this to write text, code, or any content into your active text field.
Run Terminal Command
Executes a shell command and returns the output. The agent can run any command your user account has permission for -- git status, ls, npm install, python script.py, etc.
Take Screenshot
Captures the current screen as a PNG image. When using a vision-capable model (GPT-4o, Claude with vision), the agent can analyze the screenshot to understand what's on your screen. This enables requests like "what error am I looking at?" or "describe this UI."
Read File
Reads the contents of a file from disk. Useful for having the agent analyze configuration files, logs, or code.
Write File
Writes content to a file on disk. The agent can create new files or overwrite existing ones.
Send Keys
Sends keyboard shortcuts to the active application. For example, Ctrl+S to save, Ctrl+Z to undo, or Alt+Tab to switch windows.
Get Accessibility Info
Queries the operating system's accessibility API to get information about the currently focused UI element -- its name, role, value, and state. This helps the agent understand context without taking a screenshot.
End Conversation
Signals that the conversation is complete. The agent calls this when it has finished fulfilling your request.
Conversation Persistence
Agent conversations are stored in a local SQLite database and survive app restarts. When you reopen Conjure, your previous conversation context is restored. You can:
- Continue where you left off
- Start a new conversation from the chat interface
- Browse conversation history
Skip Tool Permissions
By default, the agent asks for your approval before executing potentially dangerous tools (terminal commands, file writes, key sends). Each tool invocation shows a permission prompt with the tool name and parameters.
For power users, you can enable "Dangerously skip permissions" in agent mode settings. This bypasses all approval prompts and lets the agent execute tools immediately.
INFO
The skip permissions toggle is stored in localStorage (conjure:skip-tool-permissions) and persists across restarts. Disable it when you're done with unattended workflows.
WARNING
Skipping tool permissions means the agent can run arbitrary commands, write files, and send keystrokes without asking. Only enable this if you trust your LLM provider and understand the risks.
Supported Providers
Agent mode's streamChat() function is implemented for 9 providers:
| Provider | Implementation | Notes |
|---|---|---|
| OpenAI | openaiCompatibleStreamChat | Full tool use, vision with GPT-4o |
| Groq | openaiCompatibleStreamChat | Fast, free tier |
| Ollama | openaiCompatibleStreamChat | Local, free |
| OpenAI-Compatible | openaiCompatibleStreamChat | Any compatible endpoint |
| OpenRouter | openaiCompatibleStreamChat | Multi-model access |
| Azure OpenAI | openaiCompatibleStreamChat | Enterprise Azure deployments |
| DeepSeek | openaiCompatibleStreamChat | Very affordable |
| Gemini | openaiCompatibleStreamChat | Google's models |
| Claude | claudeStreamChat | Anthropic's models |
Tips
- Be specific in your requests. "Paste the word hello" is better than "write something."
- Use screenshots with vision-capable models to give the agent context about what you're looking at.
- Start simple -- try paste and terminal commands before moving to file operations.
- Check the conversation panel to see the agent's reasoning and tool calls.
- End conversations when you're done with a task to keep context clean for the next request.
