AI Agent Integration
Why tauri-pilot for AI agents
Section titled “Why tauri-pilot for AI agents”No existing tool lets AI agents interact with Tauri app UIs. The gap exists because:
- Playwright doesn’t work — Tauri uses WebKitGTK, not Chromium. Playwright has no driver for WebKitGTK.
- tauri-pilot speaks a protocol optimized for LLM consumption — the accessibility tree output is text-based, compact, and structured to be read directly by a language model.
- Refs map to UI elements unambiguously —
@e3is a stable handle within a snapshot, removing the need for CSS selectors or XPath expressions.
The snapshot → interact → verify workflow
Section titled “The snapshot → interact → verify workflow”The core loop for any AI agent using tauri-pilot is: take a snapshot to discover elements, interact using refs, then verify the result.
# Step 1: Get the accessibility treetauri-pilot snapshot -i
# Output:# - heading "PR Dashboard" [ref=e1]# - textbox "Search PRs" [ref=e2] value=""# - button "Refresh" [ref=e3]# - list "PR List" [ref=e4]# - listitem "fix: resolve memory leak #142" [ref=e5]
# Step 2: Interact using refstauri-pilot click @e3
# Step 3: See what changed (instead of re-reading the full tree)tauri-pilot diffThe diff command compares the current page with the last snapshot and returns only added, removed, and changed elements. This saves significant tokens — a typical diff after a click is 2-5 lines vs 50-100 for a full re-snapshot.
The -i flag filters to interactive elements only, reducing noise in the output.
Structured output with —json
Section titled “Structured output with —json”Use the --json flag to get machine-parseable output when the agent needs to process responses programmatically.
tauri-pilot snapshot --jsontauri-pilot text @e1 --jsontauri-pilot url --jsonJSON output is useful when the agent needs to extract specific values, compare states, or pass data between steps without text parsing.
Example: Automated UI testing workflow
Section titled “Example: Automated UI testing workflow”# 1. Health checktauri-pilot ping
# 2. Navigate to a pagetauri-pilot navigate "http://localhost:1420/settings"
# 3. Wait for page loadtauri-pilot wait --selector ".settings-form"
# 4. Snapshot to discover elementstauri-pilot snapshot -i
# 5. Fill a formtauri-pilot fill @e2 "new-value"
# 6. Submittauri-pilot click @e5
# 7. Verify successtauri-pilot wait --selector ".success-toast"tauri-pilot snapshot -itauri-pilot assert text @e1 "Settings saved"tauri-pilot assert url "/settings"Best practices for snapshot parsing
Section titled “Best practices for snapshot parsing”- Always take a fresh snapshot before interacting — refs reset on each snapshot.
@e1in one snapshot may refer to a different element in the next. - Use
diffinstead of re-snapshotting — after an interaction,tauri-pilot diffreturns only what changed. This is much cheaper than re-reading the full tree. - Use
-ito filter interactive elements — this reduces output size and makes the tree easier to parse. - Use
-sto scope to a section —tauri-pilot snapshot -s "#sidebar"limits the tree to a subtree, further reducing noise. - Use
waitbefore snapshot — after navigation or interaction, wait for the page to settle before taking a snapshot to avoid acting on stale state. - Use
assertfor verification —tauri-pilot assert text @e1 "Dashboard"returns exit 0 on match, exit 1 on mismatch. This replaces the three-steptext @e1→ parse → compare pattern, saving a round-trip and token parsing. - Save snapshots for multi-step workflows —
tauri-pilot snapshot --save before.snapthentauri-pilot diff --ref before.snaplets you compare against any point in time. - Record agent workflows for regression testing — wrap your interaction sequence in
record start/record stop --output test.jsonto capture actions as a replayable script. Usereplay test.jsonto re-run the same sequence, orreplay test.json --export shto generate a standalone shell script.
# Assert examples — one-step verificationtauri-pilot assert text @e1 "Dashboard" # exact text matchtauri-pilot assert visible @e3 # element is visibletauri-pilot assert value @e2 "workspace" # input valuetauri-pilot assert count ".list-item" 5 # element counttauri-pilot assert contains @e1 "error" # partial text matchtauri-pilot assert url "/dashboard" # URL substring
# Scoped snapshot exampletauri-pilot snapshot -i -s "#main-content"Multi-window apps
Section titled “Multi-window apps”For Tauri apps with multiple windows, use tauri-pilot windows to discover available windows and --window <label> to target a specific one:
# List all windowstauri-pilot windows# main http://localhost:1420/ My App# settings http://localhost:1420/settings Settings
# Target a specific windowtauri-pilot --window settings snapshot -itauri-pilot --window settings fill @e2 "new-value"
# Or set via environment variableexport TAURI_PILOT_WINDOW=settingstauri-pilot snapshot -iWithout --window, all commands target the main window (or the first available). The window flag works with every command including --follow mode and ipc.
Integration with Claude Code
Section titled “Integration with Claude Code”tauri-pilot is designed to be used as a tool by Claude Code directly:
- Claude Code can read the snapshot output directly — it is an accessibility tree, the same representation Claude Code uses internally for UI reasoning.
- The ref system (
@e1) maps directly to how Claude Code thinks about UI elements as discrete, addressable targets. - Combine with
--jsonfor structured data extraction when the agent needs to compare values or branch on state.
A typical Claude Code session using tauri-pilot looks like this:
# Claude Code calls these as shell commands during a tasktauri-pilot ping # verify the app is runningtauri-pilot snapshot -i # discover what's on screentauri-pilot fill @e2 "search query" # interacttauri-pilot click @e3 # submittauri-pilot wait --selector ".results" # wait for responsetauri-pilot diff # see what changed (token-efficient)tauri-pilot snapshot -i # refresh refstauri-pilot assert text @e1 "Results" # verify in one stepNo custom integration code is needed — tauri-pilot is a CLI that Claude Code can invoke directly.