Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@ Command-only flags (like `find --first`) that do not flow to the platform layer
## React Native Verification
- After changing runtime code exercised through `bin/agent-device.mjs` or the daemon, run `pnpm build` and `pnpm clean:daemon` before manual device verification so snapshots use current `dist` output.
- For Android RN/Expo/dev-client apps connected to any local Metro port, `adb reverse tcp:<port> tcp:<port>` is harmless and should be run before opening the app or URL on the emulator/device.
- In sandboxed agent environments, run manual `agent-device` CLI verification that starts the daemon outside the sandbox with escalation. The daemon binds localhost, and sandboxed runs can fail before any product code executes with `listen EPERM: operation not permitted 127.0.0.1` or repeated `Failed to start daemon`/metadata cleanup messages. Do not spend time debugging those as agent-device regressions; rerun the same command with escalation. Unit tests, typecheck, lint, and build can stay sandboxed unless they need platform devices or network/listener access.

## Manual Device Session Hygiene
- Treat every manually opened `agent-device` session as a resource that must be closed, including exploratory sessions and failed verification attempts.
Expand Down
44 changes: 29 additions & 15 deletions docs/adr/0004-ios-snapshot-backend-strategy.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,13 @@ Accepted
## Context

Agent Device exposes iOS UI state through snapshots produced by the long-lived XCTest runner. The
runner has two different snapshot needs:
runner has three different snapshot needs:

- rich diagnostics and selector disambiguation, where a recursive XCTest snapshot is useful because
it preserves hierarchy, static text, wrappers, scroll containers, and ancestry;
- agent-facing regular context, where the important contract is the effective user-visible UI,
fixed controls such as tab bars, and scroll-hidden hints for content outside visible scroll
containers;
- rich diagnostics and selector disambiguation, where a raw recursive XCTest snapshot is useful
because it preserves hierarchy, static text, wrappers, scroll containers, and ancestry;
- agent-facing compact interactive context, where the important contract is fast, bounded discovery
of visible controls and stable refs for the next action.

Expand All @@ -31,13 +34,22 @@ predictable.
Keep XCTest as the default iOS automation runner and split iOS snapshot capture into explicit
strategies:

- **Full tree strategy**: use recursive XCTest snapshots for normal/full snapshots, raw snapshots,
diagnostics, and cases that need hierarchy. If XCTest reports a real AX serialization failure,
preserve that error instead of pretending the UI is empty.
- **Regular visible strategy**: use recursive XCTest snapshots, but emit only the effective
user-visible tree plus visible ancestors and scroll-hidden hints. A node inside a scroll
container is user-visible only when it intersects both the app viewport and the nearest visible
scroll container. Offscreen descendants should be visited to set `hiddenContentAbove` /
`hiddenContentBelow`, not emitted as normal visible nodes. This strategy must not use an
arbitrary node-count cutoff: fixed controls that are later in traversal order, such as bottom tab
bars after long lists, are part of the visible UI contract.
- **Raw diagnostic strategy**: use recursive XCTest snapshots for raw snapshots, diagnostics, and
cases that need hierarchy. Raw output is allowed to be noisy and large; if the transport cannot
carry the response, fail explicitly instead of silently truncating the tree at a hard node count.
If XCTest reports a real AX serialization failure, preserve that error instead of pretending the
UI is empty.
- **Compact interactive strategy**: for `snapshot -i -c`, use a bounded flat XCTest query strategy
that avoids recursive root snapshots and app/window property reads. It should prefer fast,
one-screen actionability over hierarchy fidelity and should return a sparse root quickly when
XCTest cannot enumerate controls.
XCTest cannot enumerate controls. Its bound is time-based, not a hidden fixed node budget.
- **Future simulator AX-service strategy**: treat Bluesky-class failures as evidence that XCTest is
not a complete semantic snapshot backend. A robust semantic fix should add a host-side simulator
accessibility backend, similar in role to `idb` accessibility commands or Argent's `ax-service`,
Expand All @@ -62,18 +74,20 @@ avoid those app/window reads.

## Consequences

Compact interactive snapshots are allowed to be less complete than full snapshots, but they must be
bounded and honest. They should never block for the full daemon snapshot timeout because one app has
a pathological AX tree.
Compact interactive snapshots are allowed to be less complete than regular or raw snapshots, but
they must be bounded and honest. They should never block for the full daemon snapshot timeout
because one app has a pathological AX tree.

Full snapshots remain the right tool when hierarchy matters. They may still fail loudly on
XCTest-broken trees; that failure is useful because retrying the same recursive capture is unlikely
to reveal a different tree.
Regular snapshots remain the right tool for agents and Maestro compatibility because they describe
what a user can currently perceive and interact with. Raw snapshots remain the right tool when
hierarchy matters. Both may still fail loudly on XCTest-broken trees; that failure is useful because
retrying the same recursive capture is unlikely to reveal a different tree.

A future AX-service backend is the correct place to regain Bluesky-class semantic coverage. It
should be added as a platform backend with its own lifecycle, protocol, normalization, timing
metrics, and fallback rules, not as another special case inside the XCTest runner.

When adding new iOS snapshot behavior, maintainers should first decide which strategy owns it. If a
change tries to make compact snapshots rich by reintroducing recursive snapshots, or tries to make
full snapshots fast by hiding XCTest failures, it is probably crossing strategy boundaries.
change tries to make compact snapshots rich by reintroducing recursive snapshots, tries to make
regular snapshots fast by dropping visible controls behind a node budget, or tries to make raw
snapshots safe by silently truncating, it is probably crossing strategy boundaries.
Loading
Loading