callstack · thymikee · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -183,6 +183,7 @@ Command-only flags (like `find --first`) that do not flow to the platform layer
 ## React Native Verification
 - After changing runtime code exercised through `bin/agent-device.mjs` or the daemon, run `pnpm build` and `pnpm clean:daemon` before manual device verification so snapshots use current `dist` output.
 - For Android RN/Expo/dev-client apps connected to any local Metro port, `adb reverse tcp:<port> tcp:<port>` is harmless and should be run before opening the app or URL on the emulator/device.
+- In sandboxed agent environments, run manual `agent-device` CLI verification that starts the daemon outside the sandbox with escalation. The daemon binds localhost, and sandboxed runs can fail before any product code executes with `listen EPERM: operation not permitted 127.0.0.1` or repeated `Failed to start daemon`/metadata cleanup messages. Do not spend time debugging those as agent-device regressions; rerun the same command with escalation. Unit tests, typecheck, lint, and build can stay sandboxed unless they need platform devices or network/listener access.
 
 ## Manual Device Session Hygiene
 - Treat every manually opened `agent-device` session as a resource that must be closed, including exploratory sessions and failed verification attempts.

diff --git a/docs/adr/0004-ios-snapshot-backend-strategy.md b/docs/adr/0004-ios-snapshot-backend-strategy.md
@@ -7,10 +7,13 @@ Accepted
 ## Context
 
 Agent Device exposes iOS UI state through snapshots produced by the long-lived XCTest runner. The
-runner has two different snapshot needs:
+runner has three different snapshot needs:
 
-- rich diagnostics and selector disambiguation, where a recursive XCTest snapshot is useful because
-  it preserves hierarchy, static text, wrappers, scroll containers, and ancestry;
+- agent-facing regular context, where the important contract is the effective user-visible UI,
+  fixed controls such as tab bars, and scroll-hidden hints for content outside visible scroll
+  containers;
+- rich diagnostics and selector disambiguation, where a raw recursive XCTest snapshot is useful
+  because it preserves hierarchy, static text, wrappers, scroll containers, and ancestry;
 - agent-facing compact interactive context, where the important contract is fast, bounded discovery
   of visible controls and stable refs for the next action.
 
@@ -31,13 +34,22 @@ predictable.
 Keep XCTest as the default iOS automation runner and split iOS snapshot capture into explicit
 strategies:
 
-- **Full tree strategy**: use recursive XCTest snapshots for normal/full snapshots, raw snapshots,
-  diagnostics, and cases that need hierarchy. If XCTest reports a real AX serialization failure,
-  preserve that error instead of pretending the UI is empty.
+- **Regular visible strategy**: use recursive XCTest snapshots, but emit only the effective
+  user-visible tree plus visible ancestors and scroll-hidden hints. A node inside a scroll
+  container is user-visible only when it intersects both the app viewport and the nearest visible
+  scroll container. Offscreen descendants should be visited to set `hiddenContentAbove` /
+  `hiddenContentBelow`, not emitted as normal visible nodes. This strategy must not use an
+  arbitrary node-count cutoff: fixed controls that are later in traversal order, such as bottom tab
+  bars after long lists, are part of the visible UI contract.
+- **Raw diagnostic strategy**: use recursive XCTest snapshots for raw snapshots, diagnostics, and
+  cases that need hierarchy. Raw output is allowed to be noisy and large; if the transport cannot
+  carry the response, fail explicitly instead of silently truncating the tree at a hard node count.
+  If XCTest reports a real AX serialization failure, preserve that error instead of pretending the
+  UI is empty.
 - **Compact interactive strategy**: for `snapshot -i -c`, use a bounded flat XCTest query strategy
   that avoids recursive root snapshots and app/window property reads. It should prefer fast,
   one-screen actionability over hierarchy fidelity and should return a sparse root quickly when
-  XCTest cannot enumerate controls.
+  XCTest cannot enumerate controls. Its bound is time-based, not a hidden fixed node budget.
 - **Future simulator AX-service strategy**: treat Bluesky-class failures as evidence that XCTest is
   not a complete semantic snapshot backend. A robust semantic fix should add a host-side simulator
   accessibility backend, similar in role to `idb` accessibility commands or Argent's `ax-service`,
@@ -62,18 +74,20 @@ avoid those app/window reads.
 
 ## Consequences
 
-Compact interactive snapshots are allowed to be less complete than full snapshots, but they must be
-bounded and honest. They should never block for the full daemon snapshot timeout because one app has
-a pathological AX tree.
+Compact interactive snapshots are allowed to be less complete than regular or raw snapshots, but
+they must be bounded and honest. They should never block for the full daemon snapshot timeout
+because one app has a pathological AX tree.
 
-Full snapshots remain the right tool when hierarchy matters. They may still fail loudly on
-XCTest-broken trees; that failure is useful because retrying the same recursive capture is unlikely
-to reveal a different tree.
+Regular snapshots remain the right tool for agents and Maestro compatibility because they describe
+what a user can currently perceive and interact with. Raw snapshots remain the right tool when
+hierarchy matters. Both may still fail loudly on XCTest-broken trees; that failure is useful because
+retrying the same recursive capture is unlikely to reveal a different tree.
 
 A future AX-service backend is the correct place to regain Bluesky-class semantic coverage. It
 should be added as a platform backend with its own lifecycle, protocol, normalization, timing
 metrics, and fallback rules, not as another special case inside the XCTest runner.
 
 When adding new iOS snapshot behavior, maintainers should first decide which strategy owns it. If a
-change tries to make compact snapshots rich by reintroducing recursive snapshots, or tries to make
-full snapshots fast by hiding XCTest failures, it is probably crossing strategy boundaries.
+change tries to make compact snapshots rich by reintroducing recursive snapshots, tries to make
+regular snapshots fast by dropping visible controls behind a node budget, or tries to make raw
+snapshots safe by silently truncating, it is probably crossing strategy boundaries.