AI Agent Integration
Why tauri-pilot for AI agents
Section titled “Why tauri-pilot for AI agents”No existing tool lets AI agents interact with Tauri app UIs. The gap exists because:
- Playwright doesn’t work — Playwright drives standalone browser processes (Chromium, Firefox, and its own bundled WebKit build). Tauri embeds the system WebView directly (WebKitGTK on Linux, WebKit on macOS), so there is no browser process for Playwright to attach to.
- tauri-pilot speaks a protocol optimized for LLM consumption — the accessibility tree output is text-based, compact, and structured to be read directly by a language model.
- Refs map to UI elements unambiguously —
@e3is a stable handle within a snapshot, removing the need for CSS selectors or XPath expressions.
The snapshot → interact → verify workflow
Section titled “The snapshot → interact → verify workflow”The core loop for any AI agent using tauri-pilot is: take a snapshot to discover elements, interact using refs, then verify the result.
# Step 1: Get the accessibility treetauri-pilot snapshot -i
# Output:# - heading "PR Dashboard" [ref=e1]# - textbox "Search PRs" [ref=e2] value=""# - button "Refresh" [ref=e3]# - list "PR List" [ref=e4]# - listitem "fix: resolve memory leak #142" [ref=e5]
# Step 2: Interact using refstauri-pilot click @e3
# Step 3: See what changed (instead of re-reading the full tree)tauri-pilot diffThe diff command compares the current page with the last snapshot and returns only added, removed, and changed elements. This saves significant tokens — a typical diff after a click is 2-5 lines vs 50-100 for a full re-snapshot.
The -i flag filters to interactive elements only, reducing noise in the output.
Structured output with —json
Section titled “Structured output with —json”Use the --json flag to get machine-parseable output when the agent needs to process responses programmatically.
tauri-pilot snapshot --jsontauri-pilot text @e1 --jsontauri-pilot url --jsonJSON output is useful when the agent needs to extract specific values, compare states, or pass data between steps without text parsing.
Example: Automated UI testing workflow
Section titled “Example: Automated UI testing workflow”# 1. Health checktauri-pilot ping
# 2. Navigate to a pagetauri-pilot navigate "http://localhost:1420/settings"
# 3. Wait for page loadtauri-pilot wait --selector ".settings-form"
# 4. Snapshot to discover elementstauri-pilot snapshot -i
# 5. Fill a formtauri-pilot fill @e2 "new-value"
# 6. Submittauri-pilot click @e5
# 7. Verify successtauri-pilot wait --selector ".success-toast"tauri-pilot snapshot -itauri-pilot assert text @e1 "Settings saved"tauri-pilot assert url "/settings"Best practices for snapshot parsing
Section titled “Best practices for snapshot parsing”- Always take a fresh snapshot before interacting — refs reset on each snapshot.
@e1in one snapshot may refer to a different element in the next. - Use
diffinstead of re-snapshotting — after an interaction,tauri-pilot diffreturns only what changed. This is much cheaper than re-reading the full tree. - Use
-ito filter interactive elements — this reduces output size and makes the tree easier to parse. - Use
-sto scope to a section —tauri-pilot snapshot -s "#sidebar"limits the tree to a subtree, further reducing noise. - Use
waitbefore snapshot — after navigation or interaction, wait for the page to settle before taking a snapshot to avoid acting on stale state. - Use
assertfor verification —tauri-pilot assert text @e1 "Dashboard"returns exit 0 on match, exit 1 on mismatch. This replaces the three-steptext @e1→ parse → compare pattern, saving a round-trip and token parsing. - Save snapshots for multi-step workflows —
tauri-pilot snapshot --save before.snapthentauri-pilot diff --ref before.snaplets you compare against any point in time. - Record agent workflows for regression testing — wrap your interaction sequence in
record start/record stop --output test.jsonto capture actions as a replayable script. Usereplay test.jsonto re-run the same sequence, orreplay test.json --export shto generate a standalone shell script.
# Assert examples — one-step verificationtauri-pilot assert text @e1 "Dashboard" # exact text matchtauri-pilot assert visible @e3 # element is visibletauri-pilot assert value @e2 "workspace" # input valuetauri-pilot assert count ".list-item" 5 # element counttauri-pilot assert contains @e1 "error" # partial text matchtauri-pilot assert url "/dashboard" # URL substring
# Scoped snapshot exampletauri-pilot snapshot -i -s "#main-content"Multi-window apps
Section titled “Multi-window apps”For Tauri apps with multiple windows, use tauri-pilot windows to discover available windows and --window <label> to target a specific one:
# List all windowstauri-pilot windows# main http://localhost:1420/ My App# settings http://localhost:1420/settings Settings
# Target a specific windowtauri-pilot --window settings snapshot -itauri-pilot --window settings fill @e2 "new-value"
# Or set via environment variableexport TAURI_PILOT_WINDOW=settingstauri-pilot snapshot -iWithout --window, all commands target the main window (or the first available). The window flag works with every command including --follow mode and ipc.
Integration with Claude Code
Section titled “Integration with Claude Code”tauri-pilot is designed to be used as a tool by Claude Code directly:
- Claude Code can read the snapshot output directly — it is an accessibility tree, the same representation Claude Code uses internally for UI reasoning.
- The ref system (
@e1) maps directly to how Claude Code thinks about UI elements as discrete, addressable targets. - Combine with
--jsonfor structured data extraction when the agent needs to compare values or branch on state.
A typical Claude Code session using tauri-pilot looks like this:
# Claude Code calls these as shell commands during a tasktauri-pilot ping # verify the app is runningtauri-pilot snapshot -i # discover what's on screentauri-pilot fill @e2 "search query" # interacttauri-pilot click @e3 # submittauri-pilot wait --selector ".results" # wait for responsetauri-pilot diff # see what changed (token-efficient)tauri-pilot snapshot -i # refresh refstauri-pilot assert text @e1 "Results" # verify in one stepNo custom integration code is needed — tauri-pilot is a CLI that Claude Code can invoke directly.
MCP Server
Section titled “MCP Server”Agents with native Model Context Protocol support can use tauri-pilot as a stdio MCP server:
{ "mcpServers": { "tauri-pilot": { "command": "tauri-pilot", "args": ["mcp"] } }}The MCP server exposes the same app-facing commands as structured tools:
snapshot, click, fill, logs, network, eval, ipc, assert_*, and
more. Tool calls return structured JSON content, so agents do not need to parse
terminal output.
Use global flags before mcp when an agent should target a specific app socket or
window:
{ "mcpServers": { "tauri-pilot": { "command": "tauri-pilot", "args": ["--socket", "/tmp/tauri-pilot-myapp.sock", "--window", "main", "mcp"] } }}