Skip to content

feat: seed Orchard shielded pool at genesis with fast, observable sync#3732

Merged
QuantumExplorer merged 46 commits into
v3.1-devfrom
test/sheilded_test_data
May 31, 2026
Merged

feat: seed Orchard shielded pool at genesis with fast, observable sync#3732
QuantumExplorer merged 46 commits into
v3.1-devfrom
test/sheilded_test_data

Conversation

@shumkov
Copy link
Copy Markdown
Collaborator

@shumkov shumkov commented May 25, 2026

Issue being fixed or feature implemented

Continues the work from #3714 / #3769: make it possible to exercise wallet shielded-note sync at scale (≈1,000,000 notes) on a local devnet, and make that sync both fast and observable.

When drive-abci is built with --cfg create_sdk_test_data, the Orchard shielded pool is pre-populated at genesis with ≈1,000,000 notes (a handful owned across two deterministic ZIP-32 test wallets). Standing that data up surfaced two further needs that this PR also addresses: cold sync of a million notes had to be made fast enough to iterate on, and the long-running sync needed real progress feedback in the wallet UI.

What was done?

Genesis seeding of the shielded pool (drive-abci, dashmate, Dockerfile, scripts/)

  • InitChain pre-populates the Orchard pool under create_sdk_test_data. Filler notes (random-valid cmx + opaque ciphertext) plus a few real owned notes encrypted to fixed ZIP-32 test wallets, all driven by a single seeded RNG for byte-identical re-runs.
  • Bulk-seed path rewritten onto GroveDB append_many_raw with a single end-of-bake commit_mmr, batched for fast bake at N=1M.
  • Pre-baked snapshot: a snapshot-bake CLI dumps the seeded shielded subtree to a file at image-build time; InitChain applies it via the DRIVE_SHIELDED_SNAPSHOT env var (fast genesis), with a runtime fallback to live seeding when the snapshot is absent.
  • Allow create_sdk_test_data on Devnet, not just Regtest.
  • dashmate gains a generic buildArgs map on the docker-build schema (with a config migration to backfill it); setup_local_network.sh sets SDK_TEST_DATA=true and CARGO_BUILD_PROFILE=release; env generation forwards build args into compose.

Faster note sync (platform-wallet, rs-sdk)

  • Multi-chunk fetch (max_query_chunks, v1 = 4) — up to ~8192 notes per request, ~122 requests for 1M notes.
  • WAL + synchronous=NORMAL + temp_store=MEMORY PRAGMAs on the file-backed commitment-tree store, removing the per-cmx fsync bottleneck.
  • Pull-based streaming sync that interleaves SDK network fetch with wallet tree-append, fetching up to max_concurrent = 16 chunks concurrently across the network's nodes.

Observable sync — dual progress bars (platform-wallet, platform-wallet-ffi, rs-sdk, swift-sdk)

  • Two distinct progress signals: downloaded (per-chunk network progress) and checked (cumulative leaves committed to the local Pallas/Orchard tree), each reported against the on-chain total as denominator.
  • The on-chain total note count is extracted "for free" from the note-fetch proof itself (the CommitmentTree parent element is already in the proof), so no extra round-trip is needed for the progress denominator.
  • New FFI EventHandlerCallbacks slots plumb both signals to the host; the Swift example app renders a dual ProgressView and preserves cold-sync timing (elapsed / last / longest).

Devnet wiring (swift-sdk, rs-sdk-ffi)

  • Overridable DAPI + quorum URLs and devnet support so the wallet can target a local seeded devnet, with DAPI fan-out auto-discovery.

Correctness fixes (platform-wallet)

  • "Clear" now performs a true cold reset: it empties the shared commitment tree (not just per-subwallet state), so a post-clear resync rebuilds from position 0 instead of gate-skipping into a still-full tree.
  • clear() is now fallible and all-or-nothing: it resets the store first and returns early on failure, only dropping the in-memory account/persister registries once both store resets succeed — so a failed clear can't silently turn future syncs into no-ops while the host keeps its state. The failure propagates through the FFI so the host doesn't wipe its own state on a Rust-side error.

Proof verification (drive, rs-drive-proof-verifier)

  • verify_shielded_encrypted_notes returns the on-chain total_count extracted from the same proof, with an explicit root-hash equality check between the note sub-proof and the count sub-proof.

How Has This Been Tested?

  • drive-abci shielded genesis tests: cargo test -p drive-abci --lib create_genesis_state::test::shielded (with --cfg create_sdk_test_data) — wallet derivation, note generator, determinism, per-wallet decrypt, cross-wallet privacy, aggregate balance, and Drive integration.
  • drive proof-verifier tests: cargo test -p drive --lib verify::shielded::verify_shielded_encrypted_notestotal_count extraction across full and subset verification modes.
  • platform-wallet coordinator clear() tests: success path (tree emptied + registries dropped) and the failure-path regression guard (forced store-reset failure leaves registries populated and returns Err).
  • dashmate config schema-walker unit tests.
  • cargo fmt --all and cargo clippy clean across the touched crates.
  • On device: a cold sync of ≈1,000,000 notes against an SDK_TEST_DATA-seeded devnet recovers the expected per-wallet balances, with the dual progress bars climbing coherently (Checked ≤ Downloaded ≤ total) and "Clear" triggering a true cold rebuild from 0.
  • Live-network/at-scale runs ship as examples/ binaries and --ignored tests under the shielded feature rather than in the default CI path.

Breaking Changes

No consensus-breaking changes. Some non-consensus SDK/FFI surface changed (the note-fetch result now carries total_count; new shielded progress callbacks on EventHandlerCallbacks; coordinator.clear() is now fallible). These are pre-release internal/host-facing API changes and do not warrant a ! per the repo's breaking-change convention.

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have added "!" to the title and described breaking changes in the corresponding section if my code contains any
  • I have made corresponding changes to the documentation if needed

For repository code-owners and collaborators only

  • I have assigned this pull request to a milestone

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Pre-baked shielded-pool snapshots with a bake CLI for fast genesis and runtime auto-detection/fallback when absent.
    • Streaming shielded-note sync with per-chunk progress, preserved on-chain total-count, and progress callbacks for hosts.
    • Host-visible shielded sync progress/state and timing exposed to UI (live scanned/committed counts, elapsed, last/longest run).
  • Documentation

    • Detailed genesis snapshot design and shielded-seeder performance guides.
  • Chores

    • Docker build-arg and local devnet build guidance for enabling SDK test data and build-profile control.

…A devnet genesis

Pre-populates the shielded pool with 500_000 Orchard notes (8 owned by two
deterministic test wallets) when a local devnet binary is built with
`--cfg=create_sdk_test_data`. Closes the gap for benchmarking wallet sync at
scale without paying per-note Halo 2 proof time at chain bring-up.

Seeder (drive-abci):
- `create_data_for_shielded_pool` runs inside `create_sdk_test_data` and
  emits 500k `ShieldedPoolOperationType::InsertNote` ops through the
  production `commitment_tree_insert_op` path. GroveDB's
  `preprocess_commitment_tree_ops` batches them into a single
  Sinsemilla-frontier load / `append_with_mem_buffer` loop / Merk
  propagation.
- Two-tier note generator: filler (random valid Pallas-base `cmx` + 216
  bytes of opaque ciphertext) + owned (real `Note::from_parts` encrypted
  to a fixed ZIP-32-derived address via `OrchardNoteEncryption<DashMemo>`).
- Genesis-time anchor recording at `block_height=1` matches production's
  end-of-block-1 anchor — single recorded anchor suffices for spends via
  the wallet's one-checkpoint-at-post-sync-tree-size invariant.
- Determinism: single `StdRng::seed_from_u64(0xDEAD_BEEF)` threaded
  through every loop; `seed_a`/`seed_b` test wallets derived via
  `SpendingKey::from_zip32_seed(seed, coin_type=1, account=0)`.

dashmate config plumbing:
- New `buildArgs: Record<string,string>` field on `dockerBuild` schema —
  generic per-image build-arg map. Dashmate becomes the single source of
  truth for `SDK_TEST_DATA` and `CARGO_BUILD_PROFILE`; shell-env
  passthrough is dropped.
- `scripts/setup_local_network.sh` writes
  `buildArgs.SDK_TEST_DATA="true"` + `CARGO_BUILD_PROFILE="release"` to
  each `local_N` after `dashmate setup local`. Release profile is
  mandatory — debug-Sinsemilla pushes InitChain past tenderdash's timeout.
  (Marked TODO/temporary in the script — removable once Option B
  precomputed-snapshot lands, or N drops low enough for debug seeding.)
- `generateEnvsFactory` flattens both `platform.drive.abci.docker.build.buildArgs`
  and `platform.dapi.rsDapi.docker.build.buildArgs` into the
  docker-compose env so `${KEY}` substitution in the compose `build.args`
  blocks picks them up.

`dashmate config set` bug fix:
- The old `config.get(path)` pre-check rejected legal sets of new keys
  inside `additionalProperties: <schema>` maps (e.g. `…buildArgs.X`).
  Replaced with `Config.isSchemaPathAllowed(path)` which walks the JSON
  schema descending through `properties`, `additionalProperties` value
  schemas, and `$ref` references. 15 unit tests pin the walker.

Tests:
- 23 in-process tests in `rs-drive-abci`: wallet derivation, note
  generator (filler + owned + ρ uniqueness + ciphertext layout +
  determinism + per-wallet decrypt + cross-wallet privacy + aggregate
  balance), Drive-level integration (count + anchor + cross-platform
  byte-identical determinism).
- 15 dashmate unit tests for the schema walker.
- One `#[ignore]`-d functional test in `rs-platform-wallet` that drives
  the full `PlatformWalletManager → bind_shielded → coordinator.sync →
  shielded_balances` flow against a live SDK_TEST_DATA devnet.

Cost (release profile, 500_000 notes, Apple Silicon Docker Desktop):
  ~3h 41m wall-clock for the seeder. CPU work is ~95s; the rest is
  GroveDB writes through the macOS Docker VM. See
  `docs/shielded-seeder-performance.md` for the breakdown and the
  Option-B follow-up.

Refs #3714.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an offline shielded-pool snapshot bake/apply pipeline, Docker/dashmate build-arg plumbing, streaming SDK shielded sync with progress events, proof total_count extraction, wallet event/FFI progress plumbing, store reset and tuned SQLite open, SwiftUI timing/progress UI, docs, examples, and benchmarks.

Changes

Shielded Genesis Snapshot & Progress Tracking

Layer / File(s) Summary
End-to-end snapshot, streaming sync, and progress plumbing
Dockerfile, packages/rs-drive-abci/src/shielded_snapshot/*, packages/rs-drive-abci/src/main.rs, packages/rs-drive-abci/Cargo.toml, packages/rs-drive-abci/src/execution/platform_events/initialization/create_genesis_state/test/shielded.rs, packages/rs-drive-abci/src/execution/platform_events/initialization/create_genesis_state/test/mod.rs, docs/genesis-snapshot-design.md, docs/shielded-seeder-performance.md, packages/dashmate/*, packages/dashmate/src/*, scripts/setup_local_network.sh, packages/rs-sdk/src/platform/shielded/notes_sync/*, packages/rs-sdk/src/platform/shielded/*, packages/rs-drive/src/verify/shielded/*, packages/rs-drive-proof-verifier/src/*, packages/rs-platform-wallet/src/wallet/shielded/*, packages/rs-platform-wallet/src/manager/*, packages/rs-platform-wallet-ffi/src/*, packages/swift-sdk/*, packages/wasm-sdk/src/queries/shielded.rs, packages/rs-platform-wallet/examples/*, tests/benches/*
Implements snapshot dump/apply and SnapshotBake CLI; adds Dockerfile bake stage and compose overrides; forwards dashmate buildArgs into compose builds; adds streaming shielded sync (reorder buffer, stream driver, fetch_chunk returns total_count), progress callback plumbing across SDK → coordinator → PlatformEventManager → FFI → Swift, proof verifier and Drive changes to extract/return on-chain total_count, rs-drive-abci snapshot module (dump/apply, header/checksum/ingest/verify), deterministic shielded seeding and tests, tuned SQLite open and reset_commitment_tree, benchmarks/examples, and documentation/migrations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Suggested labels

ready for final review

Suggested reviewers

  • QuantumExplorer
  • thepastaclaw

Poem

🐰 In docker kitchens snapshots bake,
SSTs and checksums never break.
Streams report each scanned delight,
Swift timers glow through day and night.
Hops of joy — the seed takes flight.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/sheilded_test_data

@github-actions github-actions Bot added this to the v3.1.0 milestone May 25, 2026
…t InitChain

Cuts SDK_TEST_DATA shielded-pool seeding from ~3h41m (Docker on macOS at
N=500k) or ~65 min (native release at N=500k) to ~134 ms at InitChain by
moving the seed work to a one-shot bake during docker image build and
loading the result via `IngestExternalFile` + parent-Merk leaf patch.

End-to-end proven on local devnet: bake-inside-Dockerfile produces a
snapshot file with `combined_root=b682f38442c8...8144e3c3` byte-for-byte
identical to a local macOS bake; InitChain applies it in 134 ms; wallet-A
sync against the snapshot-loaded chain recovers the expected 400 000
balance (4 owned × 100 000) — proving proof verification works against
snapshot-loaded state.

Components
----------
- `packages/rs-drive-abci/src/shielded_snapshot/mod.rs` (NEW):
  - `dump_shielded_subtree(grove, w)` — writes one SST per CF + header +
    blake3 checksum.
  - `apply_shielded_snapshot(grove, r, txn)` — validates header,
    `ingest_subtree_sst`, cross-validates `combined_root` against the
    reconstructed CommitmentTree, patches parent Merk leaf via
    `replace_commitment_tree_subtree_root`. No fallback — any mismatch
    is FATAL so the operator notices.
- `drive-abci snapshot-bake --out <path>` subcommand:
  - Self-contained: opens fresh tempdir, runs `create_genesis_state`
    (which under `cfg(create_sdk_test_data)` seeds the shielded pool),
    then dumps the resulting subtree. Uses a NoopCoreRPC stub because
    genesis doesn't talk to Core.
- `DRIVE_SHIELDED_SNAPSHOT` env var read in `create_data_for_shielded_pool`:
  takes the snapshot fast-path when set, runs the runtime seeder
  otherwise. Failure during apply is fatal (no silent fallback).
- Dockerfile: new `bake-shielded-snapshot` stage runs `drive-abci
  snapshot-bake` against an in-container tempdir when `SDK_TEST_DATA=true`,
  embeds the snapshot at `/opt/dashmate/snapshots/shielded-pool.snap` in
  the runtime image, sets `ENV DRIVE_SHIELDED_SNAPSHOT=...`.
- `docs/genesis-snapshot-design.md` — design doc covering format,
  cross-validation, threat model, compatibility policy.

Side-effect changes
-------------------
- `ShieldedSeedConfig::sdk_test_data().total_notes`: 500_000 → 5_000 for
  fast iteration while the snapshot path is fresh. Bump back when needed.
- `scripts/setup_local_network.sh`: `CARGO_BUILD_PROFILE` set to `dev`
  for fast docker rebuilds during snapshot dev. Flip to `release` when
  going back to large N for stress testing.
- grovedb dep rev bumped to `04f2d4243872b65fbec33650e15d85571df385e1`
  (branch `feat/snapshot-apply-public-api` — DON'T MERGE; adds three
  public methods: `ingest_subtree_sst`, `replace_commitment_tree_subtree_root`,
  `raw_storage`).
- New deps in `rs-drive-abci`: `rocksdb` (SstFileWriter), `blake3`
  (snapshot checksum). Plus `grovedb-path` + `grovedb-storage` as dev-deps
  for the data-location test.

Tests
-----
- `dump_only_default_and_aux_cfs_under_shielded_subtree_prefix` — pins
  the CF layout the dump enumerates (default CF only, contrary to design's
  original §3 guess of default + aux).
- `snapshot_dump_apply_preserves_anchor` — in-process roundtrip; A seeds,
  dumps, B applies, asserts anchors match byte-for-byte.
- `bench_native_seed_full` — bake-feasibility benchmark; measures
  N=500k seed in release locally.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@shumkov shumkov changed the title feat(drive-abci,dashmate): seed Orchard shielded pool at SDK_TEST_DATA devnet genesis [DON'T MERGE] feat(drive-abci,dashmate): seed Orchard shielded pool at SDK_TEST_DATA devnet genesis May 25, 2026
shumkov and others added 12 commits May 26, 2026 10:44
…ake at N=1M

Builds on the Phase 1 snapshot bake + apply design with the
crypto-level optimization from grovedb PR #751: skip the Sinsemilla
frontier append for filler notes (which no wallet owns and nobody
needs spend proofs for), keeping the full Sinsemilla path only for
the 8 owned notes.

Drops bake time at N=1M from a projected 8+ hours (Phase 1, hit during
the prior overnight run) to ~12 minutes in-process on native macOS
release. Roundtrip test passes: anchor matches byte-for-byte between
the bake-side seed and the snapshot apply on a fresh DB.

Changes
-------
- Cherry-picked grovedb PR #751 onto our existing
  `feat/snapshot-apply-public-api` branch — exposes
  `append_raw_without_frontier` / `append_many_without_frontier` on
  `CommitmentTree` under the `test-seeding-ct` Cargo feature, plus an
  MMR-root cache that makes append O(1) instead of O(N).
  grovedb rev bumped: 04f2d424 → 60d121900.
- `rs-drive-abci/Cargo.toml`: enable `test-seeding-ct` feature on the
  grovedb-commitment-tree dep.
- `OwnedLayout::compute`: moved owned positions to the **tail**
  `[N-owned_count, N)` instead of striped. Forced by the
  frontier-less constraint (PR #751 hard-rejects a non-empty frontier
  in the no-frontier append path) — must bulk-seed ALL filler first
  while frontier is empty, then append owned through the regular path.
- `seed_shielded_pool_with_config`: replaced the
  `apply_drive_operations(InsertNote × N)` per-note path with:
    1. Open CommitmentTree directly via grovedb's
       `raw_storage().get_transactional_storage_context(...)`.
    2. Bulk-seed filler via `append_many_without_frontier(iter)` —
       blake3 only, no Pallas math.
    3. Per-note `append_raw` + `save` + `commit_mmr` for the 8 owned
       (mirrors grovedb's existing commitment_tree_insert pattern).
    4. Commit subtree's StorageBatch through the transaction.
    5. Patch parent Merk leaf via
       `replace_commitment_tree_subtree_root` using the combined_root
       from the final `append_raw`'s result (compute_current_state_root
       hit "Inconsistent store" after intermediate commit_mmr flushes;
       reading the result-returned roots avoids that).
    6. Post-bake assert `count == cfg.total_notes` — catches silent
       truncation from a panic mid-bake.
- Progress logging: emit `seed phase A progress` every 30s with
  appended/total/pct/elapsed/rate/ETA from inside the bulk iterator.
  Previously the seeder was silent between start and end, making
  N=1M bakes invisible.
- `docs/genesis-snapshot-design.md` §15 expanded with the F1
  constraint, the bake/apply asymmetry, anchor non-spendability
  caveat, F6 rejection-sampling-skip caveat, and updated test plan.
  §8 scoped to Phase 1; §10 test #6 (equivalence with runtime seeder)
  marked retired under Phase 2.

Consequences
------------
- Anchor recorded at height 1 reflects only the 8 owned cmx at
  frontier-positions 0..7 (Sinsemilla frontier has its own counter,
  independent of BulkAppendTree's total_count). Wallets attempting to
  construct spend proofs would fail. Devnet-only; gated by
  cfg(create_sdk_test_data) + SDK_TEST_DATA=true at image build.
- combined_root produced by a Phase-2 bake ≠ combined_root a
  runtime-seeded chain would produce. Cross-validation between bake
  and apply still works (both read the same stored frontier).
- Wallet sync still recovers the 8 owned notes correctly because sync
  uses chunk proofs authenticated by bulk_state_root (transitively
  authenticated by the grovedb root), not by the Sinsemilla anchor.

Performance
-----------
N=1M in-process roundtrip on native macOS release:
- seed (filler bulk + owned full): ~10 min
- dump (267 MB SST): ~30 s
- apply on fresh DB: ~1 min
- total: 12 min wall-clock

Anchors match byte-for-byte:
  d1f7ed699e0b0a7741ca91dbe7513abf7c5a53d418206ba7bde3b0a3dd974631

Phase 1 at N=1M (didn't complete after 8+ hours in docker on macOS)
projected to ~2-3 hours native release. Phase 2 is roughly 40×
faster end-to-end at this N.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…API change

bind_shielded now takes &[u8] instead of [u8; 32] after v3.1-dev merge
(part of IdentityManager refactor in PR #3651). One-character fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The original Regtest-only check made sense for local dashmate setups but
blocked the intended use case from issue #3714: stress-testing wallet sync
on real Dev networks with N=1M pre-seeded shielded notes via the snapshot
image. Real Devnet chains run with Network::Devnet, not Network::Regtest.

Mainnet + Testnet still rejected — they must never carry SDK test
fixtures (random identities + seeded shielded pool + the junk Sinsemilla
anchor that isn't a valid spend anchor).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the iOS-side measurement surface for shielded sync against a
pre-populated note pool (paloma devnet / local snapshot image):

- ShieldedService publishes lastSyncDuration + a 1Hz currentSyncElapsed
  ticker driven by the false→true edge of shieldedSyncIsSyncing. CoreContentView
  renders both in the ZK Shielded Sync Status section, with monospaced
  digits so the live ticker doesn't reflow.
- New FFI platform_wallet_manager_bind_shielded_with_raw_seed accepts a
  raw 32-byte ZIP-32 seed, bypassing the MnemonicResolver. Required to
  bind the chain-side test wallet A (`[0x73; 32]`) that the snapshot
  bake seeds — no BIP-39 mnemonic can produce that seed. Swift wrapper
  bindShieldedRawSeed mirrors the existing bindShielded shape.
- New orange "Bind Test Wallet A (Shielded)" debug button under the
  same Sync Status section hardcodes the seed and starts the sync loop.

All raw-seed / button code is tagged `TODO(shielded-snapshot-devnet-test)`
and is meant to be removed once SwiftExampleApp has a real test-wallet
import flow (tracked: #3714).

Spec: packages/swift-sdk/SwiftExampleApp/docs/shielded-sync-timing-spec.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…DAPI + quorum URLs

Devnet had no end-to-end Platform path on the iOS app — `SDK.init`
ignored the `platformDAPIAddresses` UserDefaults on non-regtest networks,
and even when addresses were provided, the trusted context provider
panicked on `Devnet` without an explicit quorum-list URL (no built-in
default exists; `quorums.devnet.<name>.networks.dash.org` is templated
on a devnet name we don't carry across FFI). Result: switching to
Devnet always showed Disconnected.

This wires it end-to-end:

- DashSDKConfig gains a `quorum_url: *const c_char` field. When set,
  `dash_sdk_create_trusted` constructs the TrustedHttpContextProvider
  via `new_with_url(...)` regardless of network — overrides the
  hardcoded network default so devnet (and non-default mainnet/testnet
  shards) work.
- SDK.init now reads `platformDAPIAddresses` and a new
  `platformQuorumURL` UserDefaults key and forwards both to the FFI
  on devnet + regtest unconditionally, and on mainnet/testnet when
  `useDockerSetup` is set. Helper `withOptionalCStrings` keeps the
  two-string lifetime contract clean.
- OptionsView gets a dedicated devnet branch with three text fields
  (SPV Peers, DAPI URL, Quorum URL); no toggle since all three are
  always custom on devnet. The picker's onChange force-enables
  `useLocalhostCore` and seeds the SPV peers default for devnet so
  the Sync tab's `startSpv()` path picks up peers without a hidden
  toggle.
- WalletManagerStore.activate compared cached managers by network
  only, returning the cached one even when AppState handed in a
  freshly-built SDK. That kept the manager pointing at the old SDK
  clone (with stale DAPI / quorum endpoints) so proof verification
  failed forever with "no available addresses to use". Now we cache
  the configured SDK handle and rebuild the manager when it doesn't
  match — the FFI `configure` is single-shot (precondition !isConfigured)
  so we can't swap in place.
- SDKLogger.log mirrors to NSLog in addition to stdout so
  `xcrun simctl spawn booted log stream` and Console.app capture the
  timing log lines even when no Xcode debugger is attached.

Token FFI test fixtures get the new `quorum_url: ptr::null()` field
to keep compiling.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…stics

Adds the diagnostic surface for the iOS balance-of-zero investigation
(P0.2) and the followup punch list that came out of the live test
session.

- `tests/shielded_sync_paloma.rs` — new integration test that binds
  wallet A (`SEED_A = [0x73; 32]`) against a remote SDK_TEST_DATA
  devnet via TrustedHttpContextProvider, fans the gRPC traffic across
  all 13 paloma masternodes' Platform gateways (`AddressList` picks
  randomly per request), runs one sync pass, asserts balance =
  EXPECTED_BALANCE_A = 400_000. Verified PASS against paloma in
  1299s: total_scanned=1_000_000, new_notes_total=4, balances={0: 400000}
  — confirms decryption + persistence work end-to-end at the Rust
  layer, isolates iOS balance=0 to iOS-specific persister callback /
  display path.
- `PlatformWalletPersistenceHandler.swift` — temporary NSLog
  instrumentation on `persistShieldedNotes` and
  `persistShieldedSyncedIndices` so the next iOS sync run surfaces
  whether the callbacks fire (via `simctl spawn booted log stream`).
  Both tagged `TODO(shielded-snapshot-devnet-test)` for removal once
  P0.2 closes. Uses the safer `NSLog("%@", message)` pattern after
  the multi-arg variadic form SIGBUSed (verified, May 2026).
- `docs/shielded-sync-devnet-followups.md` — punch list captured from
  the live test session (P0.1 cold-sync duration preservation, P0.2
  balance investigation, P1.1 auto-discover DAPI from /masternodes,
  P1.2 per-chunk progress, P1.3 node-count indicator). Status table
  + prioritized order of attack.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-out

Two iOS UX improvements that came out of the live paloma devnet test
session — both Swift-only.

**P0.1 — Preserve longest-pass duration**

The existing "Last sync duration" row gets clobbered every time a sync
completes. After a 20-minute cold sync of paloma's 1M-note pool, a
subsequent 3-second steady-state pass would overwrite the headline
number, leaving no way to recover what the cold sync actually took.

ShieldedService now publishes `longestSyncDuration` alongside
`lastSyncDuration`. Set to `max(prev, elapsed)` at every completion;
reset on bind / reset / clearLocalState (the four sites the existing
`lastSyncDuration = nil` pattern already covers).

CoreContentView renders a "Longest pass: N.NN s" row beneath "Last
sync duration", but only when meaningfully longer than the most recent
pass (`longest > last + 0.05s`) so steady-state runs don't render two
identical lines.

**P1.1 — Auto-populate DAPI list from `/masternodes`**

Previously, devnet usage required pasting all DAPI URLs by hand. With
only one URL in the field the SDK funnels every gRPC request through
one masternode (`AddressList::pick_address` picks randomly per request);
on a 1M-note sync that becomes the bottleneck.

`SDK.discoverDAPIAddresses(quorumBase:)` synchronously hits
`{quorumURL}/masternodes`, filters `status == "ENABLED"`, and builds
a comma-separated `https://<ip>:<platformHTTPPort>,…` list. 5-second
URLSession timeout via DispatchSemaphore (init can't be async without
a deeper refactor; the call runs off the main thread inside
`AppState.switchNetwork`'s Task).

SDK init triggers discovery when:
- the network uses overrides (devnet, regtest, or `useDockerSetup`),
- a quorum URL is set,
- and `platformDAPIAddresses` is empty (manual entry wins).

Resolved list is written back to `platformDAPIAddresses` UserDefaults
so OptionsView displays the actual fanned-out URLs. Subsequent SDK
builds skip the network round-trip; clearing the field re-triggers
discovery on the next switch.

OptionsView shows a small "N nodes" label above the DAPI URL field —
"auto (from /masternodes)" when empty, "13 nodes" when populated.
TextField switches to vertical-axis with `lineLimit(1...4)` so the
13-URL list is readable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Surface live cumulative-notes-scanned during a long shielded sync
pass so the iOS UI can render a ticking counter / `ProgressView`
instead of staring at a spinner for the 20+ min cold sync of a
1M-note pool. Previously `sync_shielded_notes` was a single async
call that returned only at completion — between start and end there
was no signal anything was happening.

Touches every layer:

- **rs-sdk** (`platform/shielded/notes_sync/types.rs`,
  `sync_shielded_notes.rs`): new `ProgressCallback` type
  (`Arc<dyn Fn(u64, u64) + Send + Sync>`). Added as an optional
  field on `ShieldedSyncConfig`; fired once per chunk inside the
  sliding-window fetch loop with `(cumulative_scanned,
  latest_block_height)`. Default `None` preserves prior behavior.
- **rs-platform-wallet** (`events.rs`, `wallet/shielded/coordinator.rs`,
  `wallet/shielded/sync.rs`, `manager/mod.rs`): new trait method
  `PlatformEventHandler::on_shielded_sync_progress`. Coordinator
  holds an optional progress handler in a `Mutex<Option<…>>`
  (ArcSwap can't store `dyn Fn` — needs `T: Sized`); installed by
  the manager in `configure_shielded` with a closure that forwards
  into the event manager. `sync_notes_across` reads the handler,
  wraps it in a `ShieldedSyncConfig.on_chunk_completed`, and passes
  to the SDK. Network-scoped event (no wallet_id) — a single
  coordinator pass covers every bound IVK.
- **rs-platform-wallet-ffi** (`event_handler.rs`): new
  `on_shielded_sync_progress_fn` slot on `EventHandlerCallbacks`
  (C-ABI-stable regardless of `shielded` feature). FFIEventHandler
  dispatches when shielded compiled in.
- **swift-sdk** (`PlatformWalletManagerAddressSync.swift`,
  `PlatformWalletManagerShieldedSync.swift`,
  `PlatformWalletManager.swift`): wire new
  `shieldedSyncProgressCallback` C trampoline. PlatformWalletManager
  publishes `currentShieldedSyncScanned` and
  `currentShieldedSyncBlockHeight` (cleared on every completion).
- **SwiftExampleApp** (`ShieldedService.swift`,
  `CoreContentView.swift`): ShieldedService republishes via
  `combineLatest` of the two manager publishers into
  `currentSyncScanned` / `currentSyncBlockHeight`. CoreContentView's
  "Syncing… elapsed" row gets a "Scanned this pass: N notes"
  sub-row + a linear `ProgressView` (indeterminate — chain commitment
  count isn't separately queried; absolute number is more useful
  than a fake bar). Both reset across all four bind/clear paths.
- **WalletManagerStore.swift**: drive-by fix for the stale-SDK
  rebuild path — `activeManager` is non-optional so `= nil` failed
  to compile; let the post-rebuild `activeManager = manager` line
  overwrite cleanly.

Release-mode paloma test (1M-note cold sync) baseline established
at 1022 s wall clock; per-chunk callback fires ~once per 2048 notes
inside the SDK fetch phase (~6 min in release).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
**P0.2 — wallet A balance=0 after sync, root-caused and fixed**

Rebinding shielded keys on a wallet that already has a persisted
watermark left the next sync starting at near-tree-tip, so the new
IVK never visited the positions where its owned notes lived.
`bind_shielded` calls `unregister_wallet` → `register_wallet` →
`restore_for_wallet`, and that last step rehydrates the watermark
from disk. For the mnemonic-bind path that's correct (same IVK
across restarts). For the raw-seed test path (different IVK) it
silently breaks scanning.

End-to-end validation against paloma (release-Rust integration test
+ debug iOS sim):

  Rust test:   total_scanned=1_000_000, decrypted_for_driver=4,
               balances={0: 400_000}, 1022 s
  iOS app:     Scanned 1,000,000, New 4, Spent 0,
               balance=0.000004 DASH (= 400_000 credits), 539.80 s

Fix: `NetworkShieldedCoordinator::force_rescan_subwallets(wallet_id,
accounts)` zeros the in-memory watermark for the bound subwallets.
Called from `platform_wallet_manager_bind_shielded_with_raw_seed`
right after `wallet.bind_shielded()` returns.

**Test-only scaffolding — strong DELETE-BEFORE-MERGE banner**

Every site that exists only to support the devnet sync-timing test
now carries an explicit `⚠️ TEST-ONLY CODE — DELETE BEFORE MERGE ⚠️`
header with the full 5-site cleanup checklist:

- `platform_wallet_manager_bind_shielded_with_raw_seed` FFI entry
- `NetworkShieldedCoordinator::force_rescan_subwallets` helper
- Swift `PlatformWalletManager.bindShieldedRawSeed` wrapper
- `ShieldedService.bindWithRawSeed`
- "Bind Test Wallet A (Shielded)" button HStack in CoreContentView
- ATS exception in Info.plist (auto-cleaned with the rest)

Tag: TODO(shielded-snapshot-devnet-test). Tracked: #3714.

**Devnet UX simplification**

Reduced the devnet user-input surface to a single field — the
quorum list service URL — by deriving everything else from
`{quorum}/masternodes`. New shared helper
`SDK.discoverActiveMasternodes` returns each ENABLED masternode's
SPV peer (`ip:CoreP2PPort` from the `address` field) and DAPI URL
(`https://ip:platformHTTPPort`). Both are fetched fresh on every
SDK init / SPV start — self-healing on node churn.

OptionsView devnet branch: drops the manual SPV Peers TextField
and DAPI URL TextField; only the Quorum URL field remains.
SDK.init: always auto-discovers DAPI from /masternodes on devnet,
no longer writes back to UserDefaults.
`CoreContentView.spvPeerOverride`: devnet branch fetches the same
endpoint and extracts each masternode's `address` field verbatim
(paloma reports `ip:20001`, not the canonical 29999 — using
masternode-reported port is correct).

**Bundled Info.plist + ATS exception**

iOS's App Transport Security blocks plain-HTTP requests by default,
which prevented the SDK init from reaching the HTTP-only paloma
quorum-list-server. Switched the SwiftExampleApp project from
auto-generated to a real `Info.plist` (at project root so it stays
out of `PBXFileSystemSynchronizedRootGroup`'s auto-resource
inclusion) with `NSAllowsArbitraryLoads = true`. Plist carries the
same bundle/version/scene-manifest keys Xcode would have synthesized
plus the ATS dict. Test-only, must be removed before any production
release.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t_data

# Conflicts:
#	packages/rs-sdk-ffi/src/types.rs
@shumkov shumkov changed the title [DON'T MERGE] feat(drive-abci,dashmate): seed Orchard shielded pool at SDK_TEST_DATA devnet genesis feat(drive-abci,dashmate): seed Orchard shielded pool at SDK_TEST_DATA devnet genesis May 27, 2026
shumkov and others added 7 commits May 27, 2026 19:03
Deletes the entire raw-seed bind scaffolding now that P0.2 is resolved
and the iOS devnet sync-timing test has been validated end-to-end
(0.000004 DASH on paloma matching the chain-side bake spec).

Removed sites (the 5 banner-flagged deletions plus the P0.2 diagnostic
NSLogs):

- `platform_wallet_manager_bind_shielded_with_raw_seed` (rs-platform-wallet-ffi)
- `NetworkShieldedCoordinator::force_rescan_subwallets` (rs-platform-wallet)
- `PlatformWalletManager.bindShieldedRawSeed` (swift-sdk)
- `ShieldedService.bindWithRawSeed` (SwiftExampleApp)
- "Bind Test Wallet A (Shielded)" orange button HStack (CoreContentView)
- Diagnostic NSLogs on `persistShieldedNotes` and
  `persistShieldedSyncedIndices` (PlatformWalletPersistenceHandler)

What's kept (broader devnet path, not wallet-A-specific):

- `SDK.discoverActiveMasternodes` + auto-discovery in `init`
- Info.plist + ATS exception (needed for plain-HTTP /masternodes)
- Devnet OptionsView simplification (single Quorum URL input)
- All P0.1 / P1.1 / P1.2 timing + progress UI
- `rs-platform-wallet/tests/shielded_sync_paloma.rs` — the canonical
  Rust integration test that proves the fix; keeps SEED_A by design

Verified: Rust workspace + iOS framework + SwiftExampleApp all build
clean. The non-wallet-A devnet flow (mnemonic bind on devnet against
the auto-discovered DAPI nodes) is unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two `#[ignore]`-gated benchmarks that drove the diagnostic work
behind the multi-chunk query change (#3756).

`tests/shielded_chunk_timing_bench.rs` — issues N sequential single-
chunk `ShieldedEncryptedNotes` fetches against a live SDK_TEST_DATA
devnet (paloma by default) and reports per-chunk wall-clock distribution
plus the per-chunk net/verify split (from `tracing::info!` inside
`fetch_with_metadata_and_proof`). The flat per-chunk cost across
count=64 vs count=2048 (1.57s vs 1.52s avg) is what proved the
bottleneck is fixed per-request overhead, not bandwidth — directly
justifying batching multiple chunks into one proof.

`tests/shielded_decrypt_bench.rs` — measures trial-decrypt throughput
over generated `ShieldedEncryptedNote` fixtures, single-threaded vs
rayon-parallel. Showed decrypt is <1% of cold-sync wall-clock
(~1.3s for 1M notes single-threaded on M-series), ruling out
decrypt as the target for the multi-chunk PR.

Run:
  cargo test -p platform-wallet --release --features shielded \
    --test shielded_chunk_timing_bench -- --ignored --nocapture
  cargo test -p platform-wallet --release --features shielded \
    --test shielded_decrypt_bench -- --ignored --nocapture

New dev-deps to keep the benches self-contained:
- drive-proof-verifier (for the wire types)
- rayon (for parallel decrypt comparison)

`rs-sdk-trusted-context-provider` was already a dev-dep for the
existing paloma sync test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Imports re-sorted (std first, then external, then internal — alphabetized
within each group) and over-wrapped function signatures collapsed back
to one line per `rustfmt`'s defaults. Pure formatting; no behaviour
change.

Surfaced when running cargo fmt while staging the chunk-timing /
decrypt benchmark commit — a handful of files in rs-drive-abci,
rs-platform-wallet, and rs-platform-wallet-ffi had drifted out of
fmt-clean state. Folding the cleanup into its own commit keeps the
bench commit minimal and gives the next person clean diffs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps grovedb pin to feat/snapshot-apply-public-api tip
(bdc0d6e9d7a5f7cec47b2701d08d6e35f558c7b8), which incorporates the
rewritten PR #751: the old `_without_frontier` API (which skipped the
Sinsemilla frontier entirely) is replaced with batched
`CommitmentTree::append_many_raw`. The old API produced an anchor that
didn't reflect the actual leaf set — wallets reconstructing the tree
locally got a root the chain had never recorded and spend proofs
failed. `append_many_raw` is byte-for-byte equivalent to N ×
`append_raw` (same frontier, same bulk MMR, same final roots) but the
Sinsemilla anchor and bulk state root are each computed once at the
end.

Replaces the two-phase seed in
`create_genesis_state/test/shielded.rs` (Phase A: bulk-seed filler via
`append_many_without_frontier`; Phase B: per-note `append_raw` for
owned, with per-note `save()` + `commit_mmr()`) with a single
`append_many_raw(chain(filler, owned))` call followed by one `save()`
at the end. The two-phase structure only existed to work around the
old API's frontier divergence; with the new API the split is
unnecessary. Owned notes still land at positions
`[filler.len(), filler.len() + owned.len())`, matching the previous
layout downstream test-data construction relies on.

Drops the `test-seeding-ct` feature flag (deleted in PR #751 — the
`open()` frontier/bulk consistency check is now unconditional).
…t_data

# Conflicts:
#	packages/rs-sdk-ffi/src/types.rs
#	packages/swift-sdk/Sources/SwiftDashSDK/SDK.swift
#	packages/swift-sdk/SwiftExampleApp/Info.plist
Two cleanups on top of the prior `append_many_raw` migration:

**1. Removed hardcoded test wallets.** Deleted
`shielded_test_wallets.rs` entirely (SEED_A/SEED_B and TestWallet),
along with `owned_count`/`owned_value` fields on ShieldedSeedConfig,
the OwnedLayout struct, generate_owned_note, and every test that
relied on test-wallet decryption (8 in `mod tests` + the try_decrypt
helper). The seeder is now a pure stress test: N random filler notes
with no special-position owned tier. Wallet-balance flows in tests
should go through real shielded-fund transitions post-genesis
(supported via PR #3753 / shield-from-asset-lock), not via the
seeder.

**2. Batched seed loop.** The single up-front allocation of all
`total_notes` followed by a single `append_many_raw` call is replaced
with a loop:

  - Generate 10_000 filler notes from the seeded RNG
  - `ct.append_many_raw(iter)`
  - `ct.save()` to persist the Sinsemilla frontier
  - `ct.commit_mmr()` to flush the MMR overlay (required between
    repeated `append_many_raw` calls — without it, the next batch
    hits "MMR get_root failed: Inconsistent store")
  - Log running anchor + bulk-state root for observability

Memory: peak allocation goes from O(total_notes) × ~280 bytes
(216 ciphertext + 32 cmx + 32 rho + Vec overhead) to O(10k) × 280
bytes ≈ 2.7 MB regardless of N. For N=1M that's ~280 MB → ~3 MB.

Determinism: `batched_generator_matches_single_call` test pins that
chained `generate_filler_batch` calls produce byte-identical output
to a single `generate_notes(cfg)` call sized to total_notes, so the
seeded GroveDB state is invariant to BATCH_SIZE.

Observability: each batch logs `sinsemilla_root` + `bulk_state_root`
in hex. Comparing roots between bake runs (or between batched and
non-batched paths) immediately shows where divergence happens.

Verified:
- `cargo check -p drive-abci` clean with and without
  `--cfg create_sdk_test_data`
- 12 tests pass (4 unit-tests + 6 integration + 2 new batched tests):
  generate_notes_*, batched_generator_matches_single_call,
  generated_cmx_values_are_unique, empty_config_*,
  seeded_pool_count_*, seeded_anchor_*, seeding_with_same_config_*,
  different_rng_seeds_*, seeding_zero_notes_*.
QuantumExplorer added a commit to dashpay/grovedb that referenced this pull request May 28, 2026
GroveDB's `StorageContext::get` reads straight from the transaction and never
consults its `StorageBatch` (see `storage/src/rocksdb_storage/storage_context/
context_tx.rs:296-310`). So as soon as `commit_mmr` drains the MMR overlay
into the batch, the freshly-flushed peaks become invisible to reads until
`commit_multi_context_batch` lands the batch in the tx — which only happens
at the very end of a session.

For chained `append_many_raw` calls on the same `CommitmentTree` (the
500k-note Drive seeder pattern in dashpay/platform#3732), the internal
`commit_mmr` at the end of batch N drained the overlay; batch N+1's first
compaction then read a peak via `MmrStore::element_at_position` → `ctx.get` →
tx (peak not there) → `MMR::push` raised `InconsistentStore`.

Make `commit_mmr` the caller's responsibility — same as `append_raw`, which
also never flushes on its own. The overlay now stays alive across chained
batches; the caller flushes it once, right before committing the surrounding
batch. The docs spell this out and explicitly warn against mid-session
`commit_mmr`.

Updated the seeding bench to call `commit_mmr` explicitly before dropping
`ct`. The byte-for-byte equivalence, spend-usable anchor, and cost-accounting
tests are unaffected (they compare in-memory frontier/state-root values that
don't depend on commit ordering).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
shumkov pushed a commit to dashpay/grovedb that referenced this pull request May 28, 2026
GroveDB's `StorageContext::get` reads straight from the transaction and never
consults its `StorageBatch` (see `storage/src/rocksdb_storage/storage_context/
context_tx.rs:296-310`). So as soon as `commit_mmr` drains the MMR overlay
into the batch, the freshly-flushed peaks become invisible to reads until
`commit_multi_context_batch` lands the batch in the tx — which only happens
at the very end of a session.

For chained `append_many_raw` calls on the same `CommitmentTree` (the
500k-note Drive seeder pattern in dashpay/platform#3732), the internal
`commit_mmr` at the end of batch N drained the overlay; batch N+1's first
compaction then read a peak via `MmrStore::element_at_position` → `ctx.get` →
tx (peak not there) → `MMR::push` raised `InconsistentStore`.

Make `commit_mmr` the caller's responsibility — same as `append_raw`, which
also never flushes on its own. The overlay now stays alive across chained
batches; the caller flushes it once, right before committing the surrounding
batch. The docs spell this out and explicitly warn against mid-session
`commit_mmr`.

Updated the seeding bench to call `commit_mmr` explicitly before dropping
`ct`. The byte-for-byte equivalence, spend-usable anchor, and cost-accounting
tests are unaffected (they compare in-memory frontier/state-root values that
don't depend on commit ordering).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
shumkov added 2 commits May 28, 2026 17:46
Multi-batch `append_many_raw` was failing on the second call with
"MMR get_root failed: Inconsistent store". Root cause is upstream
grovedb behavior: `append_many_raw`'s internal MMR overlay
accumulates compaction state across consecutive calls and is only
safe to flush once at the end of the whole bake. Calling
`ct.commit_mmr()` between batches corrupted the in-memory overlay
so the next call read inconsistent state.

Fixes:

1. **Bump grovedb pin** to `5eb7a5380a6e974513343352acfd6b30a8c1f87c`
   — picks up upstream commit `1340db71 fix(commitment-tree):
   don't commit_mmr inside append_many_raw`.

2. **Restructure the seeder loop** in `shielded.rs`: remove the
   per-batch `ct.commit_mmr()` introduced earlier, call it exactly
   once after the loop completes (immediately before
   `compute_commitment_tree_state_root`). Keep per-batch
   `ct.save()` — it persists the Sinsemilla frontier and is cheap
   (gives durable mid-bake checkpoints + keeps the per-batch root
   logs accurate).

3. **Bump the round-trip test** `snapshot_dump_apply_preserves_anchor`
   from N=5_000 (single-batch path) to N=30_000 (three batches
   at `BATCH_SIZE = 10_000`) so it actually pins the inter-batch
   MMR-overlay handoff that this fix targets.
QuantumExplorer added a commit to dashpay/grovedb that referenced this pull request May 28, 2026
…ntier API (#751)

* feat(commitment-tree): frontier-less bulk seeding for sync/scale testing

Add `test-seeding`-gated `append_raw_without_frontier` and
`append_many_without_frontier` on `CommitmentTree`. These populate the
underlying BulkAppendTree at blake3 speed WITHOUT updating the Sinsemilla
frontier, so a devnet shielded pool can be pre-loaded with a large N of
filler notes without paying the per-note Pallas/Sinsemilla hashing (or the
full Drive insert path).

A frontier-less seeded tree has no valid Orchard anchor (so seeded notes
aren't spendable), but BulkAppendTree chunk proofs — authenticated by the
blake3 bulk state root, not the frontier — still verify, which is exactly
what client wallet sync exercises. Under `test-seeding`, `CommitmentTree::open`
tolerates the empty-frontier/populated-bulk shape so seeded state round-trips;
production behavior is unchanged when the feature is off. The grovedb crate
forwards this via `commitment_tree_test_seeding`.

For devnet/benchmark seeding only — never enable in production.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(commitment-tree): reject frontier-less seeding on a non-empty frontier

Guard both `append_raw_without_frontier` and `append_many_without_frontier`:
advancing the bulk tree while the frontier already has leaves would leave
`frontier_size < total_count`, a mismatch `open` cannot tolerate (it only
accepts an empty frontier), producing unreopenable state. Reject it instead.

Addresses CodeRabbit review on #751.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(commitment-tree): rename feature to test-seeding-ct; drop open() check under it

Rename the seeding feature `test-seeding` -> `test-seeding-ct` (and the grovedb
forwarding feature to match). Under the feature, drop the frontier/bulk
consistency check in `CommitmentTree::open` entirely rather than tolerating only
the empty-frontier case. This lets a frontier-less-seeded tree have real,
frontier-tracked notes added on top via the normal `append` path and still
reopen at any `(frontier_size, total_count)` pair. Production behavior is
unchanged when the feature is off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(commitment-tree): cover frontier-less seeding error paths for patch coverage

Add fault-injection to the test storage mock (toggleable get/put failures) and
three tests covering the previously-uncovered error branches in the
frontier-less seeding methods: the wrapped bulk-append storage error, per-entry
error propagation out of the bulk loop, and the empty-input state-root
recomputation failure. The remaining commit_mmr flush error is annotated
codecov:ignore (unreachable via the seeding API, which always flushes in-call).

Raises patch coverage above the 90% gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(bulk-append-tree): cache MMR root so append is O(1), not O(N) per call

`BulkAppendTree::append` called `get_mmr_root()` on every non-compaction
append, and `get_mmr_root()` clones the `mmr_overlay` — which accumulates the
chunk leaf nodes (each carrying a ~573 KB blob) across all compaction cycles
until the session-end flush. Cloning that growing, blob-bearing overlay on
every append made bulk seeding O(N^2): a 1M frontier-less seed degraded from
~2400 notes/s to sub-1100 and never finished in reasonable time.

The MMR is only mutated on compaction (every `epoch_size` appends), so its root
is unchanged in between. Cache it (`last_mmr_root`), refreshing only on
compaction. `from_state` keeps it lazy (`None`) because a restored MMR may not
be readable until the first append; the first append computes it once, exactly
as before. Bulk seeding is now linear at a constant ~2435 notes/s (1M in
~6.85 min vs. 30+ min / non-terminating before).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* bench(commitment-tree): add 1M frontier-less seeding throughput benchmark

Standalone (harness=false) bench that seeds N random filler notes into a real
RocksDB-backed CommitmentTree via append_many_without_frontier and reports
wall-clock time, splitting compute from the disk commit. Defaults to 1M;
override with SEED_N. Gated on test-seeding-ct.

  cargo bench -p grovedb-commitment-tree --bench seeding --features test-seeding-ct

Measured 1M random notes in ~6.85 min (linear ~2435 notes/s) after the MMR-root
caching fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(commitment-tree): batched append_many_raw — replace _without_frontier API

Per-leaf `CommitmentTree::append_raw` was slow on bulk seeds not because of
Sinsemilla-per-leaf (amortized ~1 hash via the carry chain) but because
`CommitmentFrontier::append` called `self.root_hash()` after every insert,
walking the depth-32 Sinsemilla path for ~32 hashes per leaf. For 1M leaves
that's ~32M extra hashes. The upstream `incrementalmerkletree::Frontier`
already separates append (cheap) from root (expensive); our wrapper conflated
them.

The earlier `_without_frontier` shape side-stepped this by skipping the
frontier entirely, producing a Sinsemilla anchor that reflected only a handful
of cmx — wallets reconstructing the tree locally got a root the chain had
never recorded, and spend proofs failed. This commit fixes the actual problem.

Added:
* `CommitmentFrontier::append_no_root(cmx)` — upstream `Frontier::append`,
  no depth-32 walk; carry-chain cost only.
* `CommitmentFrontier::root_hash_with_cost()` — pure accessor that
  attributes the deferred 32-hash depth walk for batched callers.
* `BulkAppendTree::append_no_state_root(value)` — `append` minus the
  per-leaf `compute_state_root` blake3. `append` now delegates to it +
  one final `compute_current_state_root`.
* `BulkAppendTree::append_many<I: IntoIterator<Item = Vec<u8>>>` — batched
  bulk-tree appends, one state-root computation at the end.
* `CommitmentTree::append_many_raw<I: IntoIterator<Item = ([u8;32],[u8;32],
  Vec<u8>)>>` — byte-for-byte equivalent to N × `append_raw` (same
  dense-buffer, MMR, `CommitmentFrontier::serialize()`, final state and
  Sinsemilla roots) but the Sinsemilla anchor and bulk state root are each
  computed exactly once at the end. Flushes the MMR overlay before return.
* `compute_current_state_root` now uses the `last_mmr_root` cache when
  populated (O(1) on the hot path).

Removed:
* `test-seeding-ct` Cargo feature in `grovedb-commitment-tree` and the
  forwarding feature in `grovedb`.
* `append_raw_without_frontier`, `append_many_without_frontier`,
  `FrontierLessAppendResult`, `BulkSeedSummary`, and their lib.rs re-exports.
* The `#[cfg(not(feature = "test-seeding-ct"))]` gating around the
  frontier/bulk consistency check in `CommitmentTree::open` — the check is
  now unconditional.
* Fault-injection mock state and storage-fault tests that only existed to
  cover the deleted branches.

Tests:
* `append_many_raw_byte_for_byte_matches_per_leaf` — for N ∈ {0,1,2,3,100,
  2048,10_000}, the per-leaf and batched paths produce identical
  `frontier.root_hash()`, `CommitmentFrontier::serialize()`, bulk state root,
  and `total_count`.
* `append_many_raw_anchor_is_spend_usable` — independent Sinsemilla
  auth-path recomputation; verifies `orchard::MerklePath::from_parts(pos,
  path).root(cmx) == ct.anchor()`.
* `append_no_root_cost_omits_per_leaf_depth_walk` — confirms the savings
  are exactly the per-leaf depth walks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(commitment-tree): add --features server to seeding bench example commands

The bench is gated by required-features = ["server"], so the example
invocations in the file-level docs need --features server or they run the
fallback `main` and skip the actual benchmark.

Addresses CodeRabbit review on #751.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(commitment-tree): don't commit_mmr inside append_many_raw

GroveDB's `StorageContext::get` reads straight from the transaction and never
consults its `StorageBatch` (see `storage/src/rocksdb_storage/storage_context/
context_tx.rs:296-310`). So as soon as `commit_mmr` drains the MMR overlay
into the batch, the freshly-flushed peaks become invisible to reads until
`commit_multi_context_batch` lands the batch in the tx — which only happens
at the very end of a session.

For chained `append_many_raw` calls on the same `CommitmentTree` (the
500k-note Drive seeder pattern in dashpay/platform#3732), the internal
`commit_mmr` at the end of batch N drained the overlay; batch N+1's first
compaction then read a peak via `MmrStore::element_at_position` → `ctx.get` →
tx (peak not there) → `MMR::push` raised `InconsistentStore`.

Make `commit_mmr` the caller's responsibility — same as `append_raw`, which
also never flushes on its own. The overlay now stays alive across chained
batches; the caller flushes it once, right before committing the surrounding
batch. The docs spell this out and explicitly warn against mid-session
`commit_mmr`.

Updated the seeding bench to call `commit_mmr` explicitly before dropping
`ct`. The byte-for-byte equivalence, spend-usable anchor, and cost-accounting
tests are unaffected (they compare in-memory frontier/state-root values that
don't depend on commit ordering).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(commitment-tree): cover append_many_raw + frontier error paths

Address patch-coverage failures on PR #751:

* Drop unused `BulkAppendTree::append_many` + `AppendManyResult`. The user
  spec said "or the equivalent" and `CommitmentTree::append_many_raw` goes
  through `append_no_state_root` in a loop — `append_many` had no real caller
  and was inflating the patch as dead code.

* Re-add fault-injection toggles to the test storage mock and exercise:
  - `append_many_raw_rejects_invalid_cmx_mid_batch`
  - `append_many_raw_rejects_wrong_payload_size_mid_batch`
  - `append_many_raw_surfaces_bulk_storage_error`
  - `frontier_append_no_root_rejects_invalid_cmx`
  - `commit_mmr_success` (drives the bulk-tree `commit_mmr` flush path
    from the commitment-tree side, where `append_many_raw` purposely
    leaves the overlay in place per #751's chained-batch fix).

* Annotate genuinely unreachable error branches with `codecov:ignore`:
  TreeFull on both `append` and `append_no_root` (requires 2^32 leaves),
  the frontier-append-error branch in `append_many_raw` (cmx already
  pre-validated upstream), and the end-of-batch state-root error branch
  (the dense-tree write-through cache + in-session MMR cache mean a
  fault-injecting mock can't trip it from inside `append_many_raw`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add coordinator tests for both halves of the clear() ordering fix:
- success path empties the commitment tree and drops the account /
  persister registries
- failure path (store reset forced to fail by removing the SQLite dir)
  returns Err and leaves the registries populated, so a failed clear
  can't silently turn future syncs into no-ops

Regression guard for the partial-clear bug fixed in 115024f.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs (1)

880-905: 💤 Low value

Platform-specific test behavior.

This test relies on Unix file handle semantics: remove_dir_all unlinks the directory but the already-open SQLite handle continues working on the orphaned inode. When reset_commitment_tree tries to reopen the path, it fails because the path no longer exists.

On Windows, remove_dir_all would likely fail while files are open, causing this test to behave differently or fail.

Consider adding a brief comment noting this is Unix-specific, or gating the test with #[cfg(unix)] if Windows CI is expected.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs` around lines
880 - 905, The test clear_failure_preserves_registries relies on Unix-specific
semantics where removing the directory (std::fs::remove_dir_all) unlinks the
path while an open SQLite handle keeps working, causing reset_commitment_tree's
subsequent Connection::open(path) to fail; update the test by either adding a
clear comment stating it's Unix-specific and why, or gate the test with
#[cfg(unix)] (or an appropriate platform check) so Windows CI doesn't run it,
referencing the test name clear_failure_preserves_registries, the use of
std::fs::remove_dir_all, reset_commitment_tree and coordinator.clear() so the
change is applied in the correct place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs`:
- Around line 880-905: The test clear_failure_preserves_registries relies on
Unix-specific semantics where removing the directory (std::fs::remove_dir_all)
unlinks the path while an open SQLite handle keeps working, causing
reset_commitment_tree's subsequent Connection::open(path) to fail; update the
test by either adding a clear comment stating it's Unix-specific and why, or
gate the test with #[cfg(unix)] (or an appropriate platform check) so Windows CI
doesn't run it, referencing the test name clear_failure_preserves_registries,
the use of std::fs::remove_dir_all, reset_commitment_tree and
coordinator.clear() so the change is applied in the correct place.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 59abb91d-622e-42f6-8ddd-08d20f32c922

📥 Commits

Reviewing files that changed from the base of the PR and between f9f5f42 and ef371a9.

📒 Files selected for processing (1)
  • packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs

The failure-injection in clear_failure_preserves_registries depends on
POSIX unlink-while-open semantics (remove the dir, the open SQLite handle
survives on the orphaned inode, but a fresh Connection::open at the
missing path fails). Windows can't remove a directory with open files,
so gate the test with #[cfg(unix)]; the success-path test is unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@QuantumExplorer QuantumExplorer changed the title feat(drive-abci,dashmate): seed Orchard shielded pool at SDK_TEST_DATA devnet genesis feat: seed Orchard shielded pool at genesis with fast, observable sync May 30, 2026
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Latest delta (ef371a9..3fdc1b5) is a single test-gating change (#[cfg(unix)] + explanatory comment) on the shielded coordinator's clear-failure test — no production code touched. Carried-forward prior findings: none (the ef371a9 review had none, and the test-gate delta cannot resurrect anything). Cumulative re-review surfaced two would-be blocking findings from codex agents, but both fail verification: (1) the EventHandlerCallbacks ABI-growth concern is explicitly documented as an accepted in-tree-only limitation (event_handler.rs:18-35, committed in f9f5f42); (2) the clear() 'data loss' narrative misreads the doc — 'in-memory state' refers to accounts/persisters, and the best-effort cleanup with retry-on-error is the documented design, with the subwallet BTreeMap being volatile working state rather than authoritative persistence. Prior findings status: none to reconcile (STILL VALID/FIXED/OUTDATED/DEFERRED N/A).

Remove two working-notes markdown files that slipped into the branch:
- docs/shielded-seeder-performance.md — stale (still cited 500k notes;
  superseded by the ~1M snapshot bake) and only a tracking doc
- SwiftExampleApp/docs/shielded-sync-devnet-followups.md — a live-session
  punch list embedding devnet infrastructure IPs

Scrub the now-dangling "see docs/shielded-seeder-performance.md" pointers
from the dashmate buildArgs comments and the snapshot design doc; the
inline "release required for SDK_TEST_DATA seeding" note already states
the actionable fact.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Latest delta (3fdc1b5..0cfb3a5) is documentation-only cleanup. Cumulative re-review at head surfaces one valid user-visible regression flagged by codex-general in the dual-progress reporting: on resumed/multi-subwallet syncs the new tree-progress ('checked') counter can begin ahead of the SDK's 'downloaded' counter because their baselines diverge (pre-stream tree_size vs aligned_start), contradicting the PR's documented Checked ≤ Downloaded ≤ total invariant. No security, consensus, or FFI defects.

🔴 1 blocking

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-platform-wallet/src/wallet/shielded/sync.rs`:
- [BLOCKING] packages/rs-platform-wallet/src/wallet/shielded/sync.rs:318-376: Resumed shielded sync reports `checked` ahead of `downloaded`, breaking the documented progress invariant
  `leaves_committed` is seeded with the pre-stream `tree_size` (sync.rs:323) and emitted as the 'checked' value via `on_tree_progress(leaves_committed, total_target)` on every batch (sync.rs:374-376). The SDK's 'downloaded' callback, by contrast, reports `state.start_index + state.cumulative_scanned` (sync_shielded_notes.rs:312-318), where `state.start_index == aligned_start` and `aligned_start = (min(watermarks) / CHUNK_SIZE) * CHUNK_SIZE` (sync.rs:221) — i.e. the rewound MIN across subwallets, not the current shared `tree_size`.

  Whenever `tree_size > aligned_start` — which is the normal case for a multi-subwallet bind (subwallet A is synced to tree_size, subwallet B binds with watermark 0, so aligned_start collapses to 0), or any resume where the shared tree has been advanced past this subwallet's chunk-aligned start by a prior pass — the very first emitted progress event has `checked = tree_size` while `downloaded = aligned_start + chunk` is much smaller. The gate `global_pos < tree_size` (sync.rs:355-357) keeps `leaves_committed` pinned at `tree_size` until the stream catches up, so the inequality `checked > downloaded` persists for most of the pass.

  This directly violates the `Checked ≤ Downloaded ≤ total` invariant the PR advertises and the comment at sync_shielded_notes.rs:313-314 ("so the 'Downloaded' bar shares the 'Checked' baseline") explicitly relies on, and will render the dual progress bars incoherently on every non-cold sync. Fix by aligning the two baselines — e.g. start `leaves_committed` at `aligned_start` (or the wallet's resume position) and only count delta appends, or have the SDK report `downloaded` from `tree_size` for the gate-skipped region.

Comment thread packages/rs-platform-wallet/src/wallet/shielded/sync.rs
The "downloaded" progress value the SDK reports is aligned_start + scanned,
where aligned_start is the rewound MIN watermark across subwallets and can
sit far below the shared tree's leaf count (a second subwallet binding at
watermark 0 collapses it to 0; a partial-chunk-tail resume rewinds below
the tail). The "checked" value is the absolute tree_size, so over the
gate-skipped re-scan region the unclamped download value read below checked
and broke the advertised Checked <= Downloaded <= total invariant on every
non-cold sync.

Clamp the forwarded download value up to the pre-stream tree_size in the
wallet consumer, which is the only place that knows both baselines.
Everything already in the tree was downloaded on a prior pass, so counting
it as downloaded is accurate; the clamp is a no-op past tree_size and for
cold syncs (tree_size == 0). Update the SDK comment that wrongly claimed
the raw value already shared the checked baseline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Cumulative incremental review. Prior blocking finding (Checked > Downloaded on resumed multi-subwallet syncs) is FIXED at af24380 — the host-visible downloaded value is now clamped up to the pre-stream tree_size baseline before crossing into the host callback, restoring Checked ≤ Downloaded. One new in-scope suggestion in the latest delta: the Downloaded ≤ total half of the same advertised invariant can still be violated transiently because total_target is seeded at 0 and grown from batch.total_count.max(), while the numerators are floored at tree_size; a first batch proven by a lagging node with total_count < tree_size will read downloaded > total until a fresher chunk arrives. No new blockers; no new FFI, security, or consensus issues.

🟡 1 suggestion(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-platform-wallet/src/wallet/shielded/sync.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/wallet/shielded/sync.rs:237-240: Seed `total_target` from `tree_size` so the denominator can't fall below the floored numerators on resumed syncs
  The clamp added at sync.rs:276-280 fixes `Checked ≤ Downloaded`, but the `Downloaded ≤ total` half of the advertised invariant can still be violated on a resume. `total_target` starts at 0 here and is updated via `total_target = total_target.max(batch.total_count)` at line 361. Meanwhile both numerators are now floored at `tree_size`: `leaves_committed` is initialized to `tree_size` (line 341) and the forwarded `downloaded` is `downloaded.max(tree_baseline)` where `tree_baseline = tree_size` (lines 277-279).

  The shielded note count is append-only, so a node that returns `batch.total_count < tree_size` is simply lagging behind the block height at which our prior sync committed `tree_size` leaves. Because the SDK explicitly allows different chunks to come back at different heights and only takes the max (sync_shielded_notes.rs:303-306), the first emitted callback in a resumed sync that races a slightly-behind node reports `checked = tree_size`, `downloaded = tree_size`, and `total = batch.total_count < tree_size` — both numerators briefly exceed the denominator until a fresher chunk arrives. Practical impact is a UI glitch (progress bar reads > 100%) rather than a correctness issue, but it directly contradicts the `Checked ≤ Downloaded ≤ total` contract the PR documents.

  Seeding `total_target` from `tree_size` is safe because the on-chain count is monotonic and can never legitimately be below what we've already committed locally.

Comment thread packages/rs-platform-wallet/src/wallet/shielded/sync.rs Outdated
…<= total

With both progress numerators floored at tree_size (the download clamp and
leaves_committed's seed), a total_target that starts at 0 and grows via
max(batch.total_count) can momentarily sit below the numerators when the
first batch is proven by a lagging node returning total_count < tree_size,
making the bar read > 100%. Seed total_target from tree_size: the on-chain
shielded count is append-only so the real total can never be below what we
have already committed locally. Cold sync (tree_size == 0) keeps the host's
indeterminate handling until the first batch lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Incremental review against prior commit af24380. The single carried-forward prior finding (sync.rs:237-240 progress-denominator seeded from 0) is FIXED at head 2b312ab: total_target is now seeded from tree_size at sync.rs:247, with an expanded comment explaining the rationale, and is still raised via .max(batch.total_count) at sync.rs:368. The latest delta introduces no new in-scope findings; the only candidate (Codex FFI ptr::read on growing vtables) is a pre-existing structural FFI pattern unrelated to this PR's stated goal.

…k_test_data

The shielded-pool snapshot bake/apply module is pure test-data tooling: the
bake reads a pool seeded by create_sdk_test_data (it dumps an empty subtree
without that cfg) and the apply runs only from that seeder's fast-path. It
has no purpose in a production drive-abci binary, yet the module and the
`snapshot-bake` CLI command were always compiled in.

Gate `pub mod shielded_snapshot` on `cfg(any(create_sdk_test_data, test))`
(the `test` arm keeps the genesis seeder's test module compiling under
`cargo test`), and gate the `SnapshotBake` command, its dispatch + config
branch, the `snapshot_bake` fn, and the bake-only `NoopCoreRPC` stub on
`cfg(create_sdk_test_data)`. The `grovedb-commitment-tree` dep stays (it is
also used by production shield/unshield validation). Verified clean builds
with and without the cfg, and under `cargo check --tests`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Reconciled the prior no-issue review at 2b312ab against current head 57af181: there are no carried-forward prior findings. The latest delta (gating shielded snapshot tooling behind create_sdk_test_data) is a clean, scope-correct refactor with no defects. Cumulative review of the current head surfaces one in-scope FFI lifecycle suggestion — platform_wallet_manager_destroy does not stop the new shielded sync thread, relying on the in-tree Swift wrapper's deinit ordering for soundness — and one minor doc-consistency nitpick. The codex coordinator clear() finding does not hold up against the documented contract and the regression test that was added with it.

🟡 1 suggestion(s)

1 additional finding(s) omitted (not in diff).

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-platform-wallet-ffi/src/manager.rs`:
- [SUGGESTION] packages/rs-platform-wallet-ffi/src/manager.rs:351-358: platform_wallet_manager_destroy does not stop shielded sync / identity sync / event adapter
  `platform_wallet_manager_destroy` only calls `manager.platform_address_sync().stop()` and then drops the manager. The full lifecycle stop sequence lives in `PlatformWalletManager::shutdown` (manager/mod.rs:295), which also stops `identity_sync_manager`, `shielded_sync_manager`, and cancels/awaits the `event_adapter` task. The shielded sync background thread (added in this PR series) can therefore still be alive after destroy returns and can fire `on_shielded_sync_*` callbacks against an `EventHandlerCallbacks.context` pointer whose Swift owner has already been freed by a host that follows only the exported destroy contract. In practice the in-tree Swift wrapper mitigates this by stopping shielded sync in `PlatformWalletManager.deinit` before calling destroy, so the deployed consumer is safe — but the C ABI itself is unsound by convention. Calling `manager.shutdown().await` from destroy (block_on'd on the FFI runtime, mirroring the other entrypoints) closes the gap without changing the Swift call sequence.

platform_wallet_manager_destroy only stopped the platform-address sync
manager, leaving the identity sync manager, the shielded sync manager, and
the wallet-event adapter task alive after destroy returned. Any of those can
fire a callback through the host-owned EventHandlerCallbacks.context pointer,
which the host may free once destroy returns — an unsound C ABI contract even
though the in-tree Swift wrapper mitigates it by stopping shielded sync in
deinit first.

Run the full idempotent PlatformWalletManager::shutdown() to completion on
the FFI's multi-threaded runtime before dropping the manager, so no
background task can outlive it. shutdown() is idempotent, so this does not
conflict with the existing Swift deinit sequence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@QuantumExplorer
Copy link
Copy Markdown
Member

Re: the review-summary suggestion that platform_wallet_manager_destroy doesn't stop shielded/identity sync or the event-adapter task — addressed in b635ffe. destroy now runs the full idempotent PlatformWalletManager::shutdown() via runtime().block_on(manager.shutdown()) before dropping the manager, so no background task can outlive it and fire a callback against a freed EventHandlerCallbacks.context. The FFI runtime() is multi-threaded, so block_on awaiting the event-adapter join task can't deadlock. No Swift change — shutdown() is idempotent, so it composes with the existing deinit sequence.

(This finding had no inline thread since manager.rs wasn't in the diff at review time; noting the disposition here instead.)

Ports the fix from #3733 into this branch. Replaces the bypassable boolean
`suppressShieldedCompletionEvents` gate with a lock-guarded monotonic
`ShieldedSyncGenerationCounter`: the FFI completion callback snapshots the
generation on the callback thread before enqueueing onto the main actor,
stop/clear bump the generation, and handleShieldedSyncCompleted drops any
event whose snapshot no longer matches. This closes the same-actor-turn
stop->restart race where a stale completion from a prior run leaked into the
new run (a boolean re-opened by start before the queued task ran).

Reconciled with this branch's dual-progress work: the resetCurrentShielded
Progress() calls in handleShieldedSyncCompleted/stop/clear are preserved.
Adds ShieldedSyncGenerationTests covering the guard. Supersedes #3733.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Cumulative incremental review of PR #3732 at head b635ffe vs prior review at 57af181. The only latest delta is the destroy-path change in packages/rs-platform-wallet-ffi/src/manager.rs that now calls runtime().block_on(manager.shutdown()) instead of stopping only platform-address sync. Prior finding classification: STILL VALID — the new call invokes the exact code the prior reviewer suggested, but PlatformWalletManager::shutdown() only calls cancel-only stop() on the three sync managers and joins just the event-adapter task. An in-flight identity-sync or shielded-sync pass that started before destroy() can keep running and fire persister callbacks (identity_sync.rs:581 calls self.persister.store() directly, bypassing the event adapter), so the same FFI use-after-free window the prior review called out is not actually closed. Codex's separate shielded coordinator clear() finding is a false positive (registries are preserved on failure per the documented contract). One carried-forward blocking finding; no new in-scope findings in the latest delta.

🔴 1 blocking

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-platform-wallet-ffi/src/manager.rs`:
- [BLOCKING] packages/rs-platform-wallet-ffi/src/manager.rs:351-366: [carried-forward, STILL VALID] manager_destroy still returns before in-flight sync passes have drained their FFI callbacks
  The new `runtime().block_on(manager.shutdown())` matches the prior review's suggested patch verbatim, but `PlatformWalletManager::shutdown()` (packages/rs-platform-wallet/src/manager/mod.rs:295-307) does not provide a real quiescence barrier for the sync managers. `PlatformAddressSyncManager::stop()` (platform_address_sync.rs:226-236), `IdentitySyncManager::stop()` (identity_sync.rs:431-441), and `ShieldedSyncManager::stop()` (shielded_sync.rs:278-295, which explicitly documents itself as 'Cancel-only: this requests cancellation and returns immediately. A pass already inside `sync_now` / `coordinator.sync()` keeps running to completion (including its persister-callback fan-out)') only cancel the loop's `CancellationToken` and return. `shutdown()` then awaits only `event_adapter_join`, not the sync-manager background-task handles, so a pass that was already inside `sync_now()` when `stop()` was called can finish after `shutdown()` returns. The platform-address summary callback at platform_address_sync.rs:282-283 goes through `event_manager`, which is at least drained via the event-adapter join, but identity sync calls `self.persister.store(sentinel, cs.into())` directly at identity_sync.rs:581 (and the shielded coordinator persister fan-out is likewise direct), bypassing the adapter. Per the Swift side (PlatformWalletManager.deinit calls `platform_wallet_manager_destroy(handle)` and then releases the `eventHandler`/`persistenceHandler` objects, which are passed to Rust via `Unmanaged.passUnretained(...).toOpaque()`), once `destroy` returns the host can free the persister context while an in-flight identity-sync or shielded-sync pass is still about to fire `persister.store(...)` through that pointer — exactly the use-after-free hazard the prior review flagged. To truly close the window, `shutdown()` needs to wait for in-flight passes to drain (use `ShieldedSyncManager::quiesce()` for shielded, and add equivalent quiesce/await-handle barriers for the address and identity sync managers) before returning.

Comment thread packages/rs-platform-wallet-ffi/src/manager.rs
manager_destroy -> shutdown() previously only cancelled the sync managers'
loops (cancel-only stop()) and awaited the event adapter. A pass already
inside sync_now() kept running and could call persister.store(...) (identity
sync) or fire on_platform_address_sync_completed / the shielded coordinator
persister fan-out directly — bypassing the event adapter — after destroy
returned and the Swift host freed the persister/event-handler context. That
is a use-after-free of a host-owned context pointer.

Add a quiescing-gate + quiesce() barrier to IdentitySyncManager and
PlatformAddressSyncManager mirroring ShieldedSyncManager::quiesce (set gate,
cancel, poll is_syncing until the in-flight pass fully drains incl. its
fan-out, reopen gate), and a quiescing check in each sync_now after the
is_syncing CAS. shutdown() now quiesce().await's all three managers before
cancelling+joining the event adapter. Also move PlatformAddressSyncManager's
is_syncing.store(false) to after the completion dispatch so the drain covers
the host callback (matches the shielded ordering). Adds 5 tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Cumulative incremental review of head 7b36f29. The latest delta is Swift-only: it replaces a boolean suppression flag with a monotonic generation counter snapshotted on the FFI thread, correctly closing the same-actor-turn restart race for the shielded completion event. However, the prior b635ffe blocking finding — manager_destroy returns before in-flight identity/shielded sync passes drain their FFI callbacks — remains STILL VALID: PlatformWalletManager::shutdown() still calls cancel-only stop() on all three sync managers instead of awaiting quiesce(), so identity_sync's direct persister.store(...) and shielded coordinator's persister fan-out can fire callbacks through Unmanaged.passUnretained Swift contexts after destroy returns and ARC releases them. One new minor finding in the latest delta: the two shielded progress callbacks do not snapshot the generation counter like the completion callback does, so a stale main-actor progress Task can repopulate the UI mirrors after stopShieldedSync/clearShielded reset them.

Reviewed commit: 7b36f29

_Note: Inline posting failed (command failed (1): python3 scripts/review_poster.py dashpay/platform 3732 7b36f29
STDOUT:

STDERR:
Traceback (most recent call last):
File "/Users/claw/.openclaw/workspace/scripts/review_poster.py", line 754, in
result = post_review(repo, pr_number, h), so I posted the same verified findings as a top-level review body._

🔴 1 blocking | 🟡 1 suggestion(s)

Verified findings

blocking: [carried-forward, STILL VALID] shutdown() is cancel-only — manager_destroy can return while in-flight sync passes still hold Swift-owned callback contexts

packages/rs-platform-wallet/src/manager/mod.rs (line 295)

Re-validation of the b635ffe prior blocking finding against current head 7b36f29: STILL VALID. The latest delta is Swift-only (a generation counter for the shielded completion event) and does not change the Rust shutdown contract.

Current code at packages/rs-platform-wallet/src/manager/mod.rs:295-307:

pub async fn shutdown(&self) {
    self.platform_address_sync_manager.stop();
    self.identity_sync_manager.stop();
    #[cfg(feature = "shielded")]
    self.shielded_sync_manager.stop();
    self.event_adapter_cancel.cancel();
    if let Some(handle) = self.event_adapter_join.lock().await.take() { ... }
}

All three stop() methods are explicitly cancel-only:

  • PlatformAddressSyncManager::stop() (platform_address_sync.rs:227-236) — takes the cancel token and returns.
  • IdentitySyncManager::stop() (identity_sync.rs:432-441) — takes the cancel token and returns.
  • ShieldedSyncManager::stop() (shielded_sync.rs:278-295) — docstring explicitly says **Cancel-only**: this requests cancellation and returns immediately. A pass already inside sync_now / coordinator.sync() keeps running to completion (including its persister-callback fan-out).

shutdown() awaits only event_adapter_join; it never awaits any sync-manager JoinHandle and never spins on is_syncing. ShieldedSyncManager::quiesce() (shielded_sync.rs:314-321) — the real barrier — exists and is correctly used by platform_wallet_manager_shielded_sync_stop / _clear (rs-platform-wallet-ffi/src/shielded_sync.rs:86-95) and by clear_shielded (mod.rs:280-286), but is NOT used here. Identity sync and platform-address sync have no quiesce equivalent at all, even though IdentitySyncManager::sync_now calls self.persister.store(...) directly (identity_sync.rs:575-587), bypassing the wallet-event adapter that shutdown does drain.

platform_wallet_manager_destroy (packages/rs-platform-wallet-ffi/src/manager.rs:351-366) runs runtime().block_on(manager.shutdown()) and its own doc-comment claims 'no task may be left alive to fire a callback against freed memory' — but shutdown() does not enforce that for identity/shielded passes already past their cancel check.

Swift PlatformWalletManager.deinit (packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManager.swift:151-157) calls platform_wallet_manager_destroy(handle) and then ARC releases the eventHandler / persistenceHandler objects whose pointers were passed across the FFI via Unmanaged.passUnretained(...).toOpaque() (e.g. PlatformWalletPersistenceHandler.swift:924-927, PlatformWalletManagerAddressSync.swift:40-50). Once destroy returns and ARC frees those Swift objects, an in-flight identity-sync or shielded-sync pass can dereference the now-dangling context: *mut c_void on its next persistence callback — a cross-language use-after-free at the FFI boundary, not just a stale-UI issue.

The new Swift generation gate does not address this: it only filters the @MainActor-side completion hop after the FFI callback has already executed. The unsafe access (Unmanaged.fromOpaque(context).takeUnretainedValue() and the identity-sync direct persister fan-out which never goes through the Swift event handler at all) happens earlier on the FFI thread, before any @mainactor task would run.

Fix shape: make shutdown() a real quiescence barrier. Add quiesce() to IdentitySyncManager and PlatformAddressSyncManager mirroring ShieldedSyncManager::quiesce (raise a quiescing gate, cancel, await is_syncing falling), then have shutdown() await quiesce() on all three before tearing down the event adapter. After that, manager_destroy returning genuinely means 'no Rust task will fire another callback through the host context pointer', which is the invariant the FFI's own comment block claims to provide.

suggestion: Shielded progress and tree-progress callbacks lack the generation guard the completion callback now has

packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManagerShieldedSync.swift (line 653)

The latest delta added a monotonic generation snapshot to shieldedSyncCompletedCallback (line 646) so a trailing main-actor hop after stopShieldedSync / clearShielded is dropped. The two progress callbacks immediately below — shieldedSyncProgressCallback (656-672) and shieldedTreeProgressCallback (680-696) — still just do Task { @MainActor [weak manager] in manager?.handleShieldedSyncProgress(...) } with no generation snapshot and no guard.

Rust-side platform_wallet_manager_shielded_sync_stop correctly uses quiesce() (rs-platform-wallet-ffi/src/shielded_sync.rs:86-95), so when stopShieldedSync() returns, no further Rust-side dispatches will happen. But a progress callback that fired during the quiesce window enqueues a Task onto the main actor; that task can land AFTER stopShieldedSync() calls resetCurrentShieldedProgress() (line 247) or clearShielded() does the same (line 279), re-publishing stale currentShielded* values from the just-stopped pass and undoing the reset on the UI.

This is the same same-turn race the completion fix was written to address, just on a different surface. Apply the same pattern: snapshot handler.manager?.shieldedSyncGeneration.current() in the FFI trampoline before enqueueing, then have the @mainactor handler drop if the snapshot mismatches the current generation.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

- [BLOCKING] In `packages/rs-platform-wallet/src/manager/mod.rs`:295-307: [carried-forward, STILL VALID] shutdown() is cancel-only — manager_destroy can return while in-flight sync passes still hold Swift-owned callback contexts
  Re-validation of the b635ffe8 prior blocking finding against current head 7b36f29bb0: STILL VALID. The latest delta is Swift-only (a generation counter for the shielded completion event) and does not change the Rust shutdown contract.

Current code at packages/rs-platform-wallet/src/manager/mod.rs:295-307:

```rust
pub async fn shutdown(&self) {
    self.platform_address_sync_manager.stop();
    self.identity_sync_manager.stop();
    #[cfg(feature = "shielded")]
    self.shielded_sync_manager.stop();
    self.event_adapter_cancel.cancel();
    if let Some(handle) = self.event_adapter_join.lock().await.take() { ... }
}

All three stop() methods are explicitly cancel-only:

  • PlatformAddressSyncManager::stop() (platform_address_sync.rs:227-236) — takes the cancel token and returns.
  • IdentitySyncManager::stop() (identity_sync.rs:432-441) — takes the cancel token and returns.
  • ShieldedSyncManager::stop() (shielded_sync.rs:278-295) — docstring explicitly says **Cancel-only**: this requests cancellation and returns immediately. A pass already inside sync_now / coordinator.sync() keeps running to completion (including its persister-callback fan-out).

shutdown() awaits only event_adapter_join; it never awaits any sync-manager JoinHandle and never spins on is_syncing. ShieldedSyncManager::quiesce() (shielded_sync.rs:314-321) — the real barrier — exists and is correctly used by platform_wallet_manager_shielded_sync_stop / _clear (rs-platform-wallet-ffi/src/shielded_sync.rs:86-95) and by clear_shielded (mod.rs:280-286), but is NOT used here. Identity sync and platform-address sync have no quiesce equivalent at all, even though IdentitySyncManager::sync_now calls self.persister.store(...) directly (identity_sync.rs:575-587), bypassing the wallet-event adapter that shutdown does drain.

platform_wallet_manager_destroy (packages/rs-platform-wallet-ffi/src/manager.rs:351-366) runs runtime().block_on(manager.shutdown()) and its own doc-comment claims 'no task may be left alive to fire a callback against freed memory' — but shutdown() does not enforce that for identity/shielded passes already past their cancel check.

Swift PlatformWalletManager.deinit (packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManager.swift:151-157) calls platform_wallet_manager_destroy(handle) and then ARC releases the eventHandler / persistenceHandler objects whose pointers were passed across the FFI via Unmanaged.passUnretained(...).toOpaque() (e.g. PlatformWalletPersistenceHandler.swift:924-927, PlatformWalletManagerAddressSync.swift:40-50). Once destroy returns and ARC frees those Swift objects, an in-flight identity-sync or shielded-sync pass can dereference the now-dangling context: *mut c_void on its next persistence callback — a cross-language use-after-free at the FFI boundary, not just a stale-UI issue.

The new Swift generation gate does not address this: it only filters the @MainActor-side completion hop after the FFI callback has already executed. The unsafe access (Unmanaged.fromOpaque(context).takeUnretainedValue() and the identity-sync direct persister fan-out which never goes through the Swift event handler at all) happens earlier on the FFI thread, before any @mainactor task would run.

Fix shape: make shutdown() a real quiescence barrier. Add quiesce() to IdentitySyncManager and PlatformAddressSyncManager mirroring ShieldedSyncManager::quiesce (raise a quiescing gate, cancel, await is_syncing falling), then have shutdown() await quiesce() on all three before tearing down the event adapter. After that, manager_destroy returning genuinely means 'no Rust task will fire another callback through the host context pointer', which is the invariant the FFI's own comment block claims to provide.

  • [SUGGESTION] In packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManagerShieldedSync.swift:653-697: Shielded progress and tree-progress callbacks lack the generation guard the completion callback now has
    The latest delta added a monotonic generation snapshot to shieldedSyncCompletedCallback (line 646) so a trailing main-actor hop after stopShieldedSync / clearShielded is dropped. The two progress callbacks immediately below — shieldedSyncProgressCallback (656-672) and shieldedTreeProgressCallback (680-696) — still just do Task { @MainActor [weak manager] in manager?.handleShieldedSyncProgress(...) } with no generation snapshot and no guard.

Rust-side platform_wallet_manager_shielded_sync_stop correctly uses quiesce() (rs-platform-wallet-ffi/src/shielded_sync.rs:86-95), so when stopShieldedSync() returns, no further Rust-side dispatches will happen. But a progress callback that fired during the quiesce window enqueues a Task onto the main actor; that task can land AFTER stopShieldedSync() calls resetCurrentShieldedProgress() (line 247) or clearShielded() does the same (line 279), re-publishing stale currentShielded* values from the just-stopped pass and undoing the reset on the UI.

This is the same same-turn race the completion fix was written to address, just on a different surface. Apply the same pattern: snapshot handler.manager?.shieldedSyncGeneration.current() in the FFI trampoline before enqueueing, then have the @mainactor handler drop if the snapshot mismatches the current generation.

</details>

The shielded completion callback snapshots the sync generation on the FFI
thread and drops stale main-actor hops, but the two progress callbacks
(shieldedSyncProgressCallback / shieldedTreeProgressCallback) did not. A
progress Task enqueued just before stopShieldedSync()/clearShielded() could
land after resetCurrentShieldedProgress() and re-publish phantom
currentShielded* values over the just-reset UI mirrors.

Apply the same pattern: snapshot shieldedSyncGeneration in both trampolines
before enqueueing and pass it to handleShieldedSyncProgress /
handleShieldedTreeProgress, which now guard on it. Adds 4 tests covering
stale-dropped / current-published for both progress surfaces. (No memory
safety impact — the Rust side already quiesces on stop/clear; this is the
UI-coherence half of the same race.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@QuantumExplorer
Copy link
Copy Markdown
Member

Re the review of 7b36f29 (both findings landed in the review body since inline posting failed — noting disposition here):

Blocking — shutdown() cancel-only: already fixed in 0cfe6c85, which post-dates the reviewed head 7b36f29b. shutdown() now quiesce().awaits all three sync managers (added a quiescing gate + quiesce() to IdentitySyncManager and PlatformAddressSyncManager mirroring ShieldedSyncManager::quiesce) before cancelling+joining the event adapter — exactly the "Fix shape" suggested. I also moved PlatformAddressSyncManager's is_syncing.store(false) to after its completion dispatch so the drain covers the host callback. So manager_destroy returning now genuinely means no Rust task will fire another callback through a host context pointer. (5 tests added there.)

Suggestion — progress callbacks lack the generation guard: fixed in be33d4128f. shieldedSyncProgressCallback / shieldedTreeProgressCallback now snapshot shieldedSyncGeneration on the FFI thread before enqueueing, and handleShieldedSyncProgress / handleShieldedTreeProgress guard on it, so a stale progress hop after stop/clear can't re-publish phantom currentShielded* over the reset mirrors. 4 tests added.

Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The most recent commit (0cfe6c8) cleanly fixes the prior blocking FFI use-after-free on shutdown by adding quiesce() barriers to IdentitySyncManager and PlatformAddressSyncManager and wiring them into PlatformWalletManager::shutdown ahead of the event-adapter join. Two in-scope suggestions remain: the two shielded progress trampolines on the Swift side still lack the generation snapshot that this PR's earlier commit added to the completion trampoline (asymmetry introduced by this PR), and the new shielded-snapshot header is not injective over the parent leaf's flags. No in-scope blocking issues.

🟡 2 suggestion(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManagerShieldedSync.swift`:
- [SUGGESTION] packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManagerShieldedSync.swift:656-697: Shielded progress callbacks lack the generation guard the completion callback now has
  `shieldedSyncCompletedCallback` (line 638-650) snapshots `handler.manager?.shieldedSyncGeneration.current()` on the FFI thread before the @MainActor hop, and the @MainActor handler drops the event if the snapshot no longer matches — the same-actor-turn restart fix added by commit 7b36f29bb0. The two sibling trampolines `shieldedSyncProgressCallback` (656-672) and `shieldedTreeProgressCallback` (680-696) still do a bare `Task { @MainActor [weak manager] in manager?.handleShieldedSync...(...) }` with no generation snapshot, and `handleShieldedSyncProgress` / `handleShieldedTreeProgress` write to `currentShielded*` mirrors with no generation check either.

  The new Rust-side `ShieldedSyncManager::quiesce()` drains in-flight passes before `platform_wallet_manager_shielded_sync_stop` / `_clear` return, so this is no longer an FFI UAF — but a progress callback that fired and enqueued its main-actor Task before quiesce observed `is_syncing == false` can still land after `resetCurrentShieldedProgress()` runs (around lines 247 / 279), republishing stale `currentShielded*` values from the just-stopped pass and undoing the UI reset. This is the same boundary-crossing race the completion fix was written to address, on a different surface.

  Apply the same snapshot+guard pattern symmetrically: take `handler.manager?.shieldedSyncGeneration.current()` on the FFI thread before enqueueing the Task, pass it through to the @MainActor handler, and drop when the snapshot mismatches the current generation.

In `packages/rs-drive-abci/src/shielded_snapshot/mod.rs`:
- [SUGGESTION] packages/rs-drive-abci/src/shielded_snapshot/mod.rs:375-385: Snapshot header flags_byte is not injective over parent-leaf flags
  On dump (lines 375-385) the encoding maps `None`, `Some(vec![])`, and `Some(vec![0])` all to `flags_byte = 0`; on apply (lines 534-538) `0` decodes back to `None`. That loses information from the parent `Element::CommitmentTree` leaf's `Option<Vec<u8>>` flags during serialization, so `apply_shielded_snapshot()` cannot always reconstruct the original parent leaf bytes. If the source tree ever carries a one-byte zero flags vector or an empty flags vector, the snapshot-applied genesis will write a different parent leaf than the live-seeded tree, changing the parent Merk serialization even though the subtree checksum and `combined_root` cross-validation still pass — `combined_root` covers the rebuilt CommitmentTree, not the parent-leaf flags bytes.

  Impact in practice is bounded because (a) this tooling is now gated behind `create_sdk_test_data` and (b) the parent leaf's flags are realistically `None` or a single non-zero byte today, but the snapshot's stated contract is a faithful round-trip of the seeded pool. An explicit presence/length discriminator (e.g. a separate `flags_present: u8` plus the byte, or sentinel-encoded as `0xFF` for `None`) avoids overloading `0`. The existing `Inconsistent` error on `flags.len() > 1` already shows the format is intentionally narrow; this fix tightens the same narrow path without widening it.

Comment thread packages/rs-drive-abci/src/shielded_snapshot/mod.rs Outdated
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Incremental review at be33d41. The only delta vs 0cfe6c8 is Swift-only: both shielded progress trampolines (shieldedSyncProgressCallback, shieldedTreeProgressCallback) now snapshot shieldedSyncGeneration.current() on the FFI thread before the @mainactor hop and the handlers gate on the snapshot, mirroring the completion-trampoline pattern. Four new XCTest cases cover stale-drop and current-publish on both surfaces. The prior progress-callback finding is RESOLVED. One carried-forward suggestion on the shielded snapshot header's non-injective flags_byte encoding remains valid but is bounded to create_sdk_test_data-gated tooling. No new in-scope defects in the delta.

🟡 1 suggestion(s)

1 additional finding(s)

suggestion: Snapshot header `flags_byte` is not injective over parent-leaf flags

packages/rs-drive-abci/src/shielded_snapshot/mod.rs (line 375)

Verified at be33d41. The dump-side encoder at lines 375-385 collapses three distinct parent Element::CommitmentTree flag states — None, Some(vec![]), and Some(vec![0]) — all to flags_byte = 0. The apply-side decoder at lines 534-538 unconditionally maps 0 back to None, then feeds that into replace_commitment_tree_subtree_root at 540-554. The combined_root cross-check at line 523 only covers the rebuilt CommitmentTree subtree contents, not the parent-leaf flags bytes, so the existing validation does not catch a parent-Merk divergence introduced by this lossy round-trip.

Concrete impact: if a source pool ever carries Some(vec![]) or Some(vec![0]) for the parent leaf flags, a snapshot-booted devnet will write different parent-leaf bytes than a live-seeded one starting from the same subtree. That changes the parent Merk serialization (and therefore the platform state root) silently, despite every downstream cross-check passing. The blast radius is bounded today because (a) this tooling is gated behind cfg(create_sdk_test_data) and (b) the realistic parent-leaf flags today are None or a single non-zero byte — hence suggestion rather than blocking. But the snapshot's stated contract is a faithful round-trip of the seeded pool, and the existing Inconsistent error on flags.len() > 1 already shows the format is intentionally narrow.

A localized fix that preserves header width: bump FORMAT_VERSION and use a presence/length discriminator — e.g. sentinel-encode None as 0xFF (rejecting 0xFF as a valid flags byte), 0x00 for Some(vec![]), and 0x01..=0xFE (or a separate flags_present: u8 plus the byte) for Some([byte]). This eliminates the 0 overload before any code path can synthesize the lossy states for that leaf.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

- [SUGGESTION] In `packages/rs-drive-abci/src/shielded_snapshot/mod.rs`:375-385: Snapshot header `flags_byte` is not injective over parent-leaf flags
  Verified at be33d412. The dump-side encoder at lines 375-385 collapses three distinct parent `Element::CommitmentTree` flag states — `None`, `Some(vec![])`, and `Some(vec![0])` — all to `flags_byte = 0`. The apply-side decoder at lines 534-538 unconditionally maps `0` back to `None`, then feeds that into `replace_commitment_tree_subtree_root` at 540-554. The `combined_root` cross-check at line 523 only covers the rebuilt CommitmentTree subtree contents, not the parent-leaf flags bytes, so the existing validation does not catch a parent-Merk divergence introduced by this lossy round-trip.

Concrete impact: if a source pool ever carries `Some(vec![])` or `Some(vec![0])` for the parent leaf flags, a snapshot-booted devnet will write different parent-leaf bytes than a live-seeded one starting from the same subtree. That changes the parent Merk serialization (and therefore the platform state root) silently, despite every downstream cross-check passing. The blast radius is bounded today because (a) this tooling is gated behind `cfg(create_sdk_test_data)` and (b) the realistic parent-leaf flags today are `None` or a single non-zero byte — hence suggestion rather than blocking. But the snapshot's stated contract is a faithful round-trip of the seeded pool, and the existing `Inconsistent` error on `flags.len() > 1` already shows the format is intentionally narrow.

A localized fix that preserves header width: bump `FORMAT_VERSION` and use a presence/length discriminator — e.g. sentinel-encode `None` as `0xFF` (rejecting `0xFF` as a valid flags byte), `0x00` for `Some(vec![])`, and `0x01..=0xFE` (or a separate `flags_present: u8` plus the byte) for `Some([byte])`. This eliminates the `0` overload before any code path can synthesize the lossy states for that leaf.

Top-level exact-SHA review posted because inline posting deduped the carried-forward open thread from the previous SHA; this records coverage for be33d41 without adding a duplicate inline thread.

…jective

The snapshot header encoded the parent Element::CommitmentTree flags as a
single byte, collapsing None, Some(vec![]), and Some(vec![0]) all to 0, and
apply decoded 0 back to None. So a parent leaf carrying Some(vec![0]) would
round-trip to None, writing different parent-Merk bytes (and platform state
root) than a live-seeded tree — undetected, since combined_root covers only
the rebuilt subtree, not the parent-leaf flags.

Encode flags as Option<u8> with a [present, byte] discriminator (None and
Some(0) now distinct on the wire), reject empty/multi-byte flags vecs loudly
at dump, and bump FORMAT_VERSION to 2. Snapshots are rebaked per image build
so no v1 compatibility is needed. Adds header round-trip + bad-discriminator
unit tests. Bounded impact (create_sdk_test_data-gated tooling) but restores
the snapshot's faithful-round-trip contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Cumulative incremental review at head b5db278. Prior be33d412 shielded_snapshot parent-leaf flags injectivity finding is FIXED: header now encodes Option<u8> as an injective [present, byte] pair, FORMAT_VERSION is bumped 1→2 (so v1 snapshots are rejected by FormatVersionMismatch rather than silently misread), ENCODED_LEN is 63, the dumper rejects empty/multi-byte parent-leaf flag vecs with ShieldedSnapshotError::Inconsistent, the decoder rejects invalid presence discriminators, and round-trip + bad-discriminator tests pin both paths. The latest delta is a single surgical commit; no new in-scope defects identified. Codex-general's dashmate buildArgs finding describes the documented intentional contract (the source comment at generateEnvsFactory.js:118-120 explicitly states drive-abci wins on collision because the namespace is shared by design), so it's dropped as not-a-defect. Codex-rust-quality's unwrap() finding is overstated as blocking: those calls live almost entirely in the offline dump_shielded_subtree bake CLI plus one InitChain site where startup panic is operationally equivalent to a typed error — preserved only as an internal follow-up.

@QuantumExplorer QuantumExplorer merged commit 40476d9 into v3.1-dev May 31, 2026
36 checks passed
@QuantumExplorer QuantumExplorer deleted the test/sheilded_test_data branch May 31, 2026 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants