feat: seed Orchard shielded pool at genesis with fast, observable sync#3732
Conversation
…A devnet genesis
Pre-populates the shielded pool with 500_000 Orchard notes (8 owned by two
deterministic test wallets) when a local devnet binary is built with
`--cfg=create_sdk_test_data`. Closes the gap for benchmarking wallet sync at
scale without paying per-note Halo 2 proof time at chain bring-up.
Seeder (drive-abci):
- `create_data_for_shielded_pool` runs inside `create_sdk_test_data` and
emits 500k `ShieldedPoolOperationType::InsertNote` ops through the
production `commitment_tree_insert_op` path. GroveDB's
`preprocess_commitment_tree_ops` batches them into a single
Sinsemilla-frontier load / `append_with_mem_buffer` loop / Merk
propagation.
- Two-tier note generator: filler (random valid Pallas-base `cmx` + 216
bytes of opaque ciphertext) + owned (real `Note::from_parts` encrypted
to a fixed ZIP-32-derived address via `OrchardNoteEncryption<DashMemo>`).
- Genesis-time anchor recording at `block_height=1` matches production's
end-of-block-1 anchor — single recorded anchor suffices for spends via
the wallet's one-checkpoint-at-post-sync-tree-size invariant.
- Determinism: single `StdRng::seed_from_u64(0xDEAD_BEEF)` threaded
through every loop; `seed_a`/`seed_b` test wallets derived via
`SpendingKey::from_zip32_seed(seed, coin_type=1, account=0)`.
dashmate config plumbing:
- New `buildArgs: Record<string,string>` field on `dockerBuild` schema —
generic per-image build-arg map. Dashmate becomes the single source of
truth for `SDK_TEST_DATA` and `CARGO_BUILD_PROFILE`; shell-env
passthrough is dropped.
- `scripts/setup_local_network.sh` writes
`buildArgs.SDK_TEST_DATA="true"` + `CARGO_BUILD_PROFILE="release"` to
each `local_N` after `dashmate setup local`. Release profile is
mandatory — debug-Sinsemilla pushes InitChain past tenderdash's timeout.
(Marked TODO/temporary in the script — removable once Option B
precomputed-snapshot lands, or N drops low enough for debug seeding.)
- `generateEnvsFactory` flattens both `platform.drive.abci.docker.build.buildArgs`
and `platform.dapi.rsDapi.docker.build.buildArgs` into the
docker-compose env so `${KEY}` substitution in the compose `build.args`
blocks picks them up.
`dashmate config set` bug fix:
- The old `config.get(path)` pre-check rejected legal sets of new keys
inside `additionalProperties: <schema>` maps (e.g. `…buildArgs.X`).
Replaced with `Config.isSchemaPathAllowed(path)` which walks the JSON
schema descending through `properties`, `additionalProperties` value
schemas, and `$ref` references. 15 unit tests pin the walker.
Tests:
- 23 in-process tests in `rs-drive-abci`: wallet derivation, note
generator (filler + owned + ρ uniqueness + ciphertext layout +
determinism + per-wallet decrypt + cross-wallet privacy + aggregate
balance), Drive-level integration (count + anchor + cross-platform
byte-identical determinism).
- 15 dashmate unit tests for the schema walker.
- One `#[ignore]`-d functional test in `rs-platform-wallet` that drives
the full `PlatformWalletManager → bind_shielded → coordinator.sync →
shielded_balances` flow against a live SDK_TEST_DATA devnet.
Cost (release profile, 500_000 notes, Apple Silicon Docker Desktop):
~3h 41m wall-clock for the seeder. CPU work is ~95s; the rest is
GroveDB writes through the macOS Docker VM. See
`docs/shielded-seeder-performance.md` for the breakdown and the
Option-B follow-up.
Refs #3714.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an offline shielded-pool snapshot bake/apply pipeline, Docker/dashmate build-arg plumbing, streaming SDK shielded sync with progress events, proof total_count extraction, wallet event/FFI progress plumbing, store reset and tuned SQLite open, SwiftUI timing/progress UI, docs, examples, and benchmarks. ChangesShielded Genesis Snapshot & Progress Tracking
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
…t InitChain
Cuts SDK_TEST_DATA shielded-pool seeding from ~3h41m (Docker on macOS at
N=500k) or ~65 min (native release at N=500k) to ~134 ms at InitChain by
moving the seed work to a one-shot bake during docker image build and
loading the result via `IngestExternalFile` + parent-Merk leaf patch.
End-to-end proven on local devnet: bake-inside-Dockerfile produces a
snapshot file with `combined_root=b682f38442c8...8144e3c3` byte-for-byte
identical to a local macOS bake; InitChain applies it in 134 ms; wallet-A
sync against the snapshot-loaded chain recovers the expected 400 000
balance (4 owned × 100 000) — proving proof verification works against
snapshot-loaded state.
Components
----------
- `packages/rs-drive-abci/src/shielded_snapshot/mod.rs` (NEW):
- `dump_shielded_subtree(grove, w)` — writes one SST per CF + header +
blake3 checksum.
- `apply_shielded_snapshot(grove, r, txn)` — validates header,
`ingest_subtree_sst`, cross-validates `combined_root` against the
reconstructed CommitmentTree, patches parent Merk leaf via
`replace_commitment_tree_subtree_root`. No fallback — any mismatch
is FATAL so the operator notices.
- `drive-abci snapshot-bake --out <path>` subcommand:
- Self-contained: opens fresh tempdir, runs `create_genesis_state`
(which under `cfg(create_sdk_test_data)` seeds the shielded pool),
then dumps the resulting subtree. Uses a NoopCoreRPC stub because
genesis doesn't talk to Core.
- `DRIVE_SHIELDED_SNAPSHOT` env var read in `create_data_for_shielded_pool`:
takes the snapshot fast-path when set, runs the runtime seeder
otherwise. Failure during apply is fatal (no silent fallback).
- Dockerfile: new `bake-shielded-snapshot` stage runs `drive-abci
snapshot-bake` against an in-container tempdir when `SDK_TEST_DATA=true`,
embeds the snapshot at `/opt/dashmate/snapshots/shielded-pool.snap` in
the runtime image, sets `ENV DRIVE_SHIELDED_SNAPSHOT=...`.
- `docs/genesis-snapshot-design.md` — design doc covering format,
cross-validation, threat model, compatibility policy.
Side-effect changes
-------------------
- `ShieldedSeedConfig::sdk_test_data().total_notes`: 500_000 → 5_000 for
fast iteration while the snapshot path is fresh. Bump back when needed.
- `scripts/setup_local_network.sh`: `CARGO_BUILD_PROFILE` set to `dev`
for fast docker rebuilds during snapshot dev. Flip to `release` when
going back to large N for stress testing.
- grovedb dep rev bumped to `04f2d4243872b65fbec33650e15d85571df385e1`
(branch `feat/snapshot-apply-public-api` — DON'T MERGE; adds three
public methods: `ingest_subtree_sst`, `replace_commitment_tree_subtree_root`,
`raw_storage`).
- New deps in `rs-drive-abci`: `rocksdb` (SstFileWriter), `blake3`
(snapshot checksum). Plus `grovedb-path` + `grovedb-storage` as dev-deps
for the data-location test.
Tests
-----
- `dump_only_default_and_aux_cfs_under_shielded_subtree_prefix` — pins
the CF layout the dump enumerates (default CF only, contrary to design's
original §3 guess of default + aux).
- `snapshot_dump_apply_preserves_anchor` — in-process roundtrip; A seeds,
dumps, B applies, asserts anchors match byte-for-byte.
- `bench_native_seed_full` — bake-feasibility benchmark; measures
N=500k seed in release locally.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ake at N=1M Builds on the Phase 1 snapshot bake + apply design with the crypto-level optimization from grovedb PR #751: skip the Sinsemilla frontier append for filler notes (which no wallet owns and nobody needs spend proofs for), keeping the full Sinsemilla path only for the 8 owned notes. Drops bake time at N=1M from a projected 8+ hours (Phase 1, hit during the prior overnight run) to ~12 minutes in-process on native macOS release. Roundtrip test passes: anchor matches byte-for-byte between the bake-side seed and the snapshot apply on a fresh DB. Changes ------- - Cherry-picked grovedb PR #751 onto our existing `feat/snapshot-apply-public-api` branch — exposes `append_raw_without_frontier` / `append_many_without_frontier` on `CommitmentTree` under the `test-seeding-ct` Cargo feature, plus an MMR-root cache that makes append O(1) instead of O(N). grovedb rev bumped: 04f2d424 → 60d121900. - `rs-drive-abci/Cargo.toml`: enable `test-seeding-ct` feature on the grovedb-commitment-tree dep. - `OwnedLayout::compute`: moved owned positions to the **tail** `[N-owned_count, N)` instead of striped. Forced by the frontier-less constraint (PR #751 hard-rejects a non-empty frontier in the no-frontier append path) — must bulk-seed ALL filler first while frontier is empty, then append owned through the regular path. - `seed_shielded_pool_with_config`: replaced the `apply_drive_operations(InsertNote × N)` per-note path with: 1. Open CommitmentTree directly via grovedb's `raw_storage().get_transactional_storage_context(...)`. 2. Bulk-seed filler via `append_many_without_frontier(iter)` — blake3 only, no Pallas math. 3. Per-note `append_raw` + `save` + `commit_mmr` for the 8 owned (mirrors grovedb's existing commitment_tree_insert pattern). 4. Commit subtree's StorageBatch through the transaction. 5. Patch parent Merk leaf via `replace_commitment_tree_subtree_root` using the combined_root from the final `append_raw`'s result (compute_current_state_root hit "Inconsistent store" after intermediate commit_mmr flushes; reading the result-returned roots avoids that). 6. Post-bake assert `count == cfg.total_notes` — catches silent truncation from a panic mid-bake. - Progress logging: emit `seed phase A progress` every 30s with appended/total/pct/elapsed/rate/ETA from inside the bulk iterator. Previously the seeder was silent between start and end, making N=1M bakes invisible. - `docs/genesis-snapshot-design.md` §15 expanded with the F1 constraint, the bake/apply asymmetry, anchor non-spendability caveat, F6 rejection-sampling-skip caveat, and updated test plan. §8 scoped to Phase 1; §10 test #6 (equivalence with runtime seeder) marked retired under Phase 2. Consequences ------------ - Anchor recorded at height 1 reflects only the 8 owned cmx at frontier-positions 0..7 (Sinsemilla frontier has its own counter, independent of BulkAppendTree's total_count). Wallets attempting to construct spend proofs would fail. Devnet-only; gated by cfg(create_sdk_test_data) + SDK_TEST_DATA=true at image build. - combined_root produced by a Phase-2 bake ≠ combined_root a runtime-seeded chain would produce. Cross-validation between bake and apply still works (both read the same stored frontier). - Wallet sync still recovers the 8 owned notes correctly because sync uses chunk proofs authenticated by bulk_state_root (transitively authenticated by the grovedb root), not by the Sinsemilla anchor. Performance ----------- N=1M in-process roundtrip on native macOS release: - seed (filler bulk + owned full): ~10 min - dump (267 MB SST): ~30 s - apply on fresh DB: ~1 min - total: 12 min wall-clock Anchors match byte-for-byte: d1f7ed699e0b0a7741ca91dbe7513abf7c5a53d418206ba7bde3b0a3dd974631 Phase 1 at N=1M (didn't complete after 8+ hours in docker on macOS) projected to ~2-3 hours native release. Phase 2 is roughly 40× faster end-to-end at this N. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…API change bind_shielded now takes &[u8] instead of [u8; 32] after v3.1-dev merge (part of IdentityManager refactor in PR #3651). One-character fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The original Regtest-only check made sense for local dashmate setups but blocked the intended use case from issue #3714: stress-testing wallet sync on real Dev networks with N=1M pre-seeded shielded notes via the snapshot image. Real Devnet chains run with Network::Devnet, not Network::Regtest. Mainnet + Testnet still rejected — they must never carry SDK test fixtures (random identities + seeded shielded pool + the junk Sinsemilla anchor that isn't a valid spend anchor). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the iOS-side measurement surface for shielded sync against a pre-populated note pool (paloma devnet / local snapshot image): - ShieldedService publishes lastSyncDuration + a 1Hz currentSyncElapsed ticker driven by the false→true edge of shieldedSyncIsSyncing. CoreContentView renders both in the ZK Shielded Sync Status section, with monospaced digits so the live ticker doesn't reflow. - New FFI platform_wallet_manager_bind_shielded_with_raw_seed accepts a raw 32-byte ZIP-32 seed, bypassing the MnemonicResolver. Required to bind the chain-side test wallet A (`[0x73; 32]`) that the snapshot bake seeds — no BIP-39 mnemonic can produce that seed. Swift wrapper bindShieldedRawSeed mirrors the existing bindShielded shape. - New orange "Bind Test Wallet A (Shielded)" debug button under the same Sync Status section hardcodes the seed and starts the sync loop. All raw-seed / button code is tagged `TODO(shielded-snapshot-devnet-test)` and is meant to be removed once SwiftExampleApp has a real test-wallet import flow (tracked: #3714). Spec: packages/swift-sdk/SwiftExampleApp/docs/shielded-sync-timing-spec.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…DAPI + quorum URLs Devnet had no end-to-end Platform path on the iOS app — `SDK.init` ignored the `platformDAPIAddresses` UserDefaults on non-regtest networks, and even when addresses were provided, the trusted context provider panicked on `Devnet` without an explicit quorum-list URL (no built-in default exists; `quorums.devnet.<name>.networks.dash.org` is templated on a devnet name we don't carry across FFI). Result: switching to Devnet always showed Disconnected. This wires it end-to-end: - DashSDKConfig gains a `quorum_url: *const c_char` field. When set, `dash_sdk_create_trusted` constructs the TrustedHttpContextProvider via `new_with_url(...)` regardless of network — overrides the hardcoded network default so devnet (and non-default mainnet/testnet shards) work. - SDK.init now reads `platformDAPIAddresses` and a new `platformQuorumURL` UserDefaults key and forwards both to the FFI on devnet + regtest unconditionally, and on mainnet/testnet when `useDockerSetup` is set. Helper `withOptionalCStrings` keeps the two-string lifetime contract clean. - OptionsView gets a dedicated devnet branch with three text fields (SPV Peers, DAPI URL, Quorum URL); no toggle since all three are always custom on devnet. The picker's onChange force-enables `useLocalhostCore` and seeds the SPV peers default for devnet so the Sync tab's `startSpv()` path picks up peers without a hidden toggle. - WalletManagerStore.activate compared cached managers by network only, returning the cached one even when AppState handed in a freshly-built SDK. That kept the manager pointing at the old SDK clone (with stale DAPI / quorum endpoints) so proof verification failed forever with "no available addresses to use". Now we cache the configured SDK handle and rebuild the manager when it doesn't match — the FFI `configure` is single-shot (precondition !isConfigured) so we can't swap in place. - SDKLogger.log mirrors to NSLog in addition to stdout so `xcrun simctl spawn booted log stream` and Console.app capture the timing log lines even when no Xcode debugger is attached. Token FFI test fixtures get the new `quorum_url: ptr::null()` field to keep compiling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…stics
Adds the diagnostic surface for the iOS balance-of-zero investigation
(P0.2) and the followup punch list that came out of the live test
session.
- `tests/shielded_sync_paloma.rs` — new integration test that binds
wallet A (`SEED_A = [0x73; 32]`) against a remote SDK_TEST_DATA
devnet via TrustedHttpContextProvider, fans the gRPC traffic across
all 13 paloma masternodes' Platform gateways (`AddressList` picks
randomly per request), runs one sync pass, asserts balance =
EXPECTED_BALANCE_A = 400_000. Verified PASS against paloma in
1299s: total_scanned=1_000_000, new_notes_total=4, balances={0: 400000}
— confirms decryption + persistence work end-to-end at the Rust
layer, isolates iOS balance=0 to iOS-specific persister callback /
display path.
- `PlatformWalletPersistenceHandler.swift` — temporary NSLog
instrumentation on `persistShieldedNotes` and
`persistShieldedSyncedIndices` so the next iOS sync run surfaces
whether the callbacks fire (via `simctl spawn booted log stream`).
Both tagged `TODO(shielded-snapshot-devnet-test)` for removal once
P0.2 closes. Uses the safer `NSLog("%@", message)` pattern after
the multi-arg variadic form SIGBUSed (verified, May 2026).
- `docs/shielded-sync-devnet-followups.md` — punch list captured from
the live test session (P0.1 cold-sync duration preservation, P0.2
balance investigation, P1.1 auto-discover DAPI from /masternodes,
P1.2 per-chunk progress, P1.3 node-count indicator). Status table
+ prioritized order of attack.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-out
Two iOS UX improvements that came out of the live paloma devnet test
session — both Swift-only.
**P0.1 — Preserve longest-pass duration**
The existing "Last sync duration" row gets clobbered every time a sync
completes. After a 20-minute cold sync of paloma's 1M-note pool, a
subsequent 3-second steady-state pass would overwrite the headline
number, leaving no way to recover what the cold sync actually took.
ShieldedService now publishes `longestSyncDuration` alongside
`lastSyncDuration`. Set to `max(prev, elapsed)` at every completion;
reset on bind / reset / clearLocalState (the four sites the existing
`lastSyncDuration = nil` pattern already covers).
CoreContentView renders a "Longest pass: N.NN s" row beneath "Last
sync duration", but only when meaningfully longer than the most recent
pass (`longest > last + 0.05s`) so steady-state runs don't render two
identical lines.
**P1.1 — Auto-populate DAPI list from `/masternodes`**
Previously, devnet usage required pasting all DAPI URLs by hand. With
only one URL in the field the SDK funnels every gRPC request through
one masternode (`AddressList::pick_address` picks randomly per request);
on a 1M-note sync that becomes the bottleneck.
`SDK.discoverDAPIAddresses(quorumBase:)` synchronously hits
`{quorumURL}/masternodes`, filters `status == "ENABLED"`, and builds
a comma-separated `https://<ip>:<platformHTTPPort>,…` list. 5-second
URLSession timeout via DispatchSemaphore (init can't be async without
a deeper refactor; the call runs off the main thread inside
`AppState.switchNetwork`'s Task).
SDK init triggers discovery when:
- the network uses overrides (devnet, regtest, or `useDockerSetup`),
- a quorum URL is set,
- and `platformDAPIAddresses` is empty (manual entry wins).
Resolved list is written back to `platformDAPIAddresses` UserDefaults
so OptionsView displays the actual fanned-out URLs. Subsequent SDK
builds skip the network round-trip; clearing the field re-triggers
discovery on the next switch.
OptionsView shows a small "N nodes" label above the DAPI URL field —
"auto (from /masternodes)" when empty, "13 nodes" when populated.
TextField switches to vertical-axis with `lineLimit(1...4)` so the
13-URL list is readable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Surface live cumulative-notes-scanned during a long shielded sync pass so the iOS UI can render a ticking counter / `ProgressView` instead of staring at a spinner for the 20+ min cold sync of a 1M-note pool. Previously `sync_shielded_notes` was a single async call that returned only at completion — between start and end there was no signal anything was happening. Touches every layer: - **rs-sdk** (`platform/shielded/notes_sync/types.rs`, `sync_shielded_notes.rs`): new `ProgressCallback` type (`Arc<dyn Fn(u64, u64) + Send + Sync>`). Added as an optional field on `ShieldedSyncConfig`; fired once per chunk inside the sliding-window fetch loop with `(cumulative_scanned, latest_block_height)`. Default `None` preserves prior behavior. - **rs-platform-wallet** (`events.rs`, `wallet/shielded/coordinator.rs`, `wallet/shielded/sync.rs`, `manager/mod.rs`): new trait method `PlatformEventHandler::on_shielded_sync_progress`. Coordinator holds an optional progress handler in a `Mutex<Option<…>>` (ArcSwap can't store `dyn Fn` — needs `T: Sized`); installed by the manager in `configure_shielded` with a closure that forwards into the event manager. `sync_notes_across` reads the handler, wraps it in a `ShieldedSyncConfig.on_chunk_completed`, and passes to the SDK. Network-scoped event (no wallet_id) — a single coordinator pass covers every bound IVK. - **rs-platform-wallet-ffi** (`event_handler.rs`): new `on_shielded_sync_progress_fn` slot on `EventHandlerCallbacks` (C-ABI-stable regardless of `shielded` feature). FFIEventHandler dispatches when shielded compiled in. - **swift-sdk** (`PlatformWalletManagerAddressSync.swift`, `PlatformWalletManagerShieldedSync.swift`, `PlatformWalletManager.swift`): wire new `shieldedSyncProgressCallback` C trampoline. PlatformWalletManager publishes `currentShieldedSyncScanned` and `currentShieldedSyncBlockHeight` (cleared on every completion). - **SwiftExampleApp** (`ShieldedService.swift`, `CoreContentView.swift`): ShieldedService republishes via `combineLatest` of the two manager publishers into `currentSyncScanned` / `currentSyncBlockHeight`. CoreContentView's "Syncing… elapsed" row gets a "Scanned this pass: N notes" sub-row + a linear `ProgressView` (indeterminate — chain commitment count isn't separately queried; absolute number is more useful than a fake bar). Both reset across all four bind/clear paths. - **WalletManagerStore.swift**: drive-by fix for the stale-SDK rebuild path — `activeManager` is non-optional so `= nil` failed to compile; let the post-rebuild `activeManager = manager` line overwrite cleanly. Release-mode paloma test (1M-note cold sync) baseline established at 1022 s wall clock; per-chunk callback fires ~once per 2048 notes inside the SDK fetch phase (~6 min in release). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
**P0.2 — wallet A balance=0 after sync, root-caused and fixed**
Rebinding shielded keys on a wallet that already has a persisted
watermark left the next sync starting at near-tree-tip, so the new
IVK never visited the positions where its owned notes lived.
`bind_shielded` calls `unregister_wallet` → `register_wallet` →
`restore_for_wallet`, and that last step rehydrates the watermark
from disk. For the mnemonic-bind path that's correct (same IVK
across restarts). For the raw-seed test path (different IVK) it
silently breaks scanning.
End-to-end validation against paloma (release-Rust integration test
+ debug iOS sim):
Rust test: total_scanned=1_000_000, decrypted_for_driver=4,
balances={0: 400_000}, 1022 s
iOS app: Scanned 1,000,000, New 4, Spent 0,
balance=0.000004 DASH (= 400_000 credits), 539.80 s
Fix: `NetworkShieldedCoordinator::force_rescan_subwallets(wallet_id,
accounts)` zeros the in-memory watermark for the bound subwallets.
Called from `platform_wallet_manager_bind_shielded_with_raw_seed`
right after `wallet.bind_shielded()` returns.
**Test-only scaffolding — strong DELETE-BEFORE-MERGE banner**
Every site that exists only to support the devnet sync-timing test
now carries an explicit `⚠️ TEST-ONLY CODE — DELETE BEFORE MERGE ⚠️ `
header with the full 5-site cleanup checklist:
- `platform_wallet_manager_bind_shielded_with_raw_seed` FFI entry
- `NetworkShieldedCoordinator::force_rescan_subwallets` helper
- Swift `PlatformWalletManager.bindShieldedRawSeed` wrapper
- `ShieldedService.bindWithRawSeed`
- "Bind Test Wallet A (Shielded)" button HStack in CoreContentView
- ATS exception in Info.plist (auto-cleaned with the rest)
Tag: TODO(shielded-snapshot-devnet-test). Tracked: #3714.
**Devnet UX simplification**
Reduced the devnet user-input surface to a single field — the
quorum list service URL — by deriving everything else from
`{quorum}/masternodes`. New shared helper
`SDK.discoverActiveMasternodes` returns each ENABLED masternode's
SPV peer (`ip:CoreP2PPort` from the `address` field) and DAPI URL
(`https://ip:platformHTTPPort`). Both are fetched fresh on every
SDK init / SPV start — self-healing on node churn.
OptionsView devnet branch: drops the manual SPV Peers TextField
and DAPI URL TextField; only the Quorum URL field remains.
SDK.init: always auto-discovers DAPI from /masternodes on devnet,
no longer writes back to UserDefaults.
`CoreContentView.spvPeerOverride`: devnet branch fetches the same
endpoint and extracts each masternode's `address` field verbatim
(paloma reports `ip:20001`, not the canonical 29999 — using
masternode-reported port is correct).
**Bundled Info.plist + ATS exception**
iOS's App Transport Security blocks plain-HTTP requests by default,
which prevented the SDK init from reaching the HTTP-only paloma
quorum-list-server. Switched the SwiftExampleApp project from
auto-generated to a real `Info.plist` (at project root so it stays
out of `PBXFileSystemSynchronizedRootGroup`'s auto-resource
inclusion) with `NSAllowsArbitraryLoads = true`. Plist carries the
same bundle/version/scene-manifest keys Xcode would have synthesized
plus the ATS dict. Test-only, must be removed before any production
release.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t_data # Conflicts: # Cargo.lock
…t_data # Conflicts: # packages/rs-sdk-ffi/src/types.rs
Deletes the entire raw-seed bind scaffolding now that P0.2 is resolved and the iOS devnet sync-timing test has been validated end-to-end (0.000004 DASH on paloma matching the chain-side bake spec). Removed sites (the 5 banner-flagged deletions plus the P0.2 diagnostic NSLogs): - `platform_wallet_manager_bind_shielded_with_raw_seed` (rs-platform-wallet-ffi) - `NetworkShieldedCoordinator::force_rescan_subwallets` (rs-platform-wallet) - `PlatformWalletManager.bindShieldedRawSeed` (swift-sdk) - `ShieldedService.bindWithRawSeed` (SwiftExampleApp) - "Bind Test Wallet A (Shielded)" orange button HStack (CoreContentView) - Diagnostic NSLogs on `persistShieldedNotes` and `persistShieldedSyncedIndices` (PlatformWalletPersistenceHandler) What's kept (broader devnet path, not wallet-A-specific): - `SDK.discoverActiveMasternodes` + auto-discovery in `init` - Info.plist + ATS exception (needed for plain-HTTP /masternodes) - Devnet OptionsView simplification (single Quorum URL input) - All P0.1 / P1.1 / P1.2 timing + progress UI - `rs-platform-wallet/tests/shielded_sync_paloma.rs` — the canonical Rust integration test that proves the fix; keeps SEED_A by design Verified: Rust workspace + iOS framework + SwiftExampleApp all build clean. The non-wallet-A devnet flow (mnemonic bind on devnet against the auto-discovered DAPI nodes) is unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two `#[ignore]`-gated benchmarks that drove the diagnostic work behind the multi-chunk query change (#3756). `tests/shielded_chunk_timing_bench.rs` — issues N sequential single- chunk `ShieldedEncryptedNotes` fetches against a live SDK_TEST_DATA devnet (paloma by default) and reports per-chunk wall-clock distribution plus the per-chunk net/verify split (from `tracing::info!` inside `fetch_with_metadata_and_proof`). The flat per-chunk cost across count=64 vs count=2048 (1.57s vs 1.52s avg) is what proved the bottleneck is fixed per-request overhead, not bandwidth — directly justifying batching multiple chunks into one proof. `tests/shielded_decrypt_bench.rs` — measures trial-decrypt throughput over generated `ShieldedEncryptedNote` fixtures, single-threaded vs rayon-parallel. Showed decrypt is <1% of cold-sync wall-clock (~1.3s for 1M notes single-threaded on M-series), ruling out decrypt as the target for the multi-chunk PR. Run: cargo test -p platform-wallet --release --features shielded \ --test shielded_chunk_timing_bench -- --ignored --nocapture cargo test -p platform-wallet --release --features shielded \ --test shielded_decrypt_bench -- --ignored --nocapture New dev-deps to keep the benches self-contained: - drive-proof-verifier (for the wire types) - rayon (for parallel decrypt comparison) `rs-sdk-trusted-context-provider` was already a dev-dep for the existing paloma sync test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Imports re-sorted (std first, then external, then internal — alphabetized within each group) and over-wrapped function signatures collapsed back to one line per `rustfmt`'s defaults. Pure formatting; no behaviour change. Surfaced when running cargo fmt while staging the chunk-timing / decrypt benchmark commit — a handful of files in rs-drive-abci, rs-platform-wallet, and rs-platform-wallet-ffi had drifted out of fmt-clean state. Folding the cleanup into its own commit keeps the bench commit minimal and gives the next person clean diffs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps grovedb pin to feat/snapshot-apply-public-api tip (bdc0d6e9d7a5f7cec47b2701d08d6e35f558c7b8), which incorporates the rewritten PR #751: the old `_without_frontier` API (which skipped the Sinsemilla frontier entirely) is replaced with batched `CommitmentTree::append_many_raw`. The old API produced an anchor that didn't reflect the actual leaf set — wallets reconstructing the tree locally got a root the chain had never recorded and spend proofs failed. `append_many_raw` is byte-for-byte equivalent to N × `append_raw` (same frontier, same bulk MMR, same final roots) but the Sinsemilla anchor and bulk state root are each computed once at the end. Replaces the two-phase seed in `create_genesis_state/test/shielded.rs` (Phase A: bulk-seed filler via `append_many_without_frontier`; Phase B: per-note `append_raw` for owned, with per-note `save()` + `commit_mmr()`) with a single `append_many_raw(chain(filler, owned))` call followed by one `save()` at the end. The two-phase structure only existed to work around the old API's frontier divergence; with the new API the split is unnecessary. Owned notes still land at positions `[filler.len(), filler.len() + owned.len())`, matching the previous layout downstream test-data construction relies on. Drops the `test-seeding-ct` feature flag (deleted in PR #751 — the `open()` frontier/bulk consistency check is now unconditional).
…t_data # Conflicts: # packages/rs-sdk-ffi/src/types.rs # packages/swift-sdk/Sources/SwiftDashSDK/SDK.swift # packages/swift-sdk/SwiftExampleApp/Info.plist
Two cleanups on top of the prior `append_many_raw` migration: **1. Removed hardcoded test wallets.** Deleted `shielded_test_wallets.rs` entirely (SEED_A/SEED_B and TestWallet), along with `owned_count`/`owned_value` fields on ShieldedSeedConfig, the OwnedLayout struct, generate_owned_note, and every test that relied on test-wallet decryption (8 in `mod tests` + the try_decrypt helper). The seeder is now a pure stress test: N random filler notes with no special-position owned tier. Wallet-balance flows in tests should go through real shielded-fund transitions post-genesis (supported via PR #3753 / shield-from-asset-lock), not via the seeder. **2. Batched seed loop.** The single up-front allocation of all `total_notes` followed by a single `append_many_raw` call is replaced with a loop: - Generate 10_000 filler notes from the seeded RNG - `ct.append_many_raw(iter)` - `ct.save()` to persist the Sinsemilla frontier - `ct.commit_mmr()` to flush the MMR overlay (required between repeated `append_many_raw` calls — without it, the next batch hits "MMR get_root failed: Inconsistent store") - Log running anchor + bulk-state root for observability Memory: peak allocation goes from O(total_notes) × ~280 bytes (216 ciphertext + 32 cmx + 32 rho + Vec overhead) to O(10k) × 280 bytes ≈ 2.7 MB regardless of N. For N=1M that's ~280 MB → ~3 MB. Determinism: `batched_generator_matches_single_call` test pins that chained `generate_filler_batch` calls produce byte-identical output to a single `generate_notes(cfg)` call sized to total_notes, so the seeded GroveDB state is invariant to BATCH_SIZE. Observability: each batch logs `sinsemilla_root` + `bulk_state_root` in hex. Comparing roots between bake runs (or between batched and non-batched paths) immediately shows where divergence happens. Verified: - `cargo check -p drive-abci` clean with and without `--cfg create_sdk_test_data` - 12 tests pass (4 unit-tests + 6 integration + 2 new batched tests): generate_notes_*, batched_generator_matches_single_call, generated_cmx_values_are_unique, empty_config_*, seeded_pool_count_*, seeded_anchor_*, seeding_with_same_config_*, different_rng_seeds_*, seeding_zero_notes_*.
GroveDB's `StorageContext::get` reads straight from the transaction and never consults its `StorageBatch` (see `storage/src/rocksdb_storage/storage_context/ context_tx.rs:296-310`). So as soon as `commit_mmr` drains the MMR overlay into the batch, the freshly-flushed peaks become invisible to reads until `commit_multi_context_batch` lands the batch in the tx — which only happens at the very end of a session. For chained `append_many_raw` calls on the same `CommitmentTree` (the 500k-note Drive seeder pattern in dashpay/platform#3732), the internal `commit_mmr` at the end of batch N drained the overlay; batch N+1's first compaction then read a peak via `MmrStore::element_at_position` → `ctx.get` → tx (peak not there) → `MMR::push` raised `InconsistentStore`. Make `commit_mmr` the caller's responsibility — same as `append_raw`, which also never flushes on its own. The overlay now stays alive across chained batches; the caller flushes it once, right before committing the surrounding batch. The docs spell this out and explicitly warn against mid-session `commit_mmr`. Updated the seeding bench to call `commit_mmr` explicitly before dropping `ct`. The byte-for-byte equivalence, spend-usable anchor, and cost-accounting tests are unaffected (they compare in-memory frontier/state-root values that don't depend on commit ordering). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GroveDB's `StorageContext::get` reads straight from the transaction and never consults its `StorageBatch` (see `storage/src/rocksdb_storage/storage_context/ context_tx.rs:296-310`). So as soon as `commit_mmr` drains the MMR overlay into the batch, the freshly-flushed peaks become invisible to reads until `commit_multi_context_batch` lands the batch in the tx — which only happens at the very end of a session. For chained `append_many_raw` calls on the same `CommitmentTree` (the 500k-note Drive seeder pattern in dashpay/platform#3732), the internal `commit_mmr` at the end of batch N drained the overlay; batch N+1's first compaction then read a peak via `MmrStore::element_at_position` → `ctx.get` → tx (peak not there) → `MMR::push` raised `InconsistentStore`. Make `commit_mmr` the caller's responsibility — same as `append_raw`, which also never flushes on its own. The overlay now stays alive across chained batches; the caller flushes it once, right before committing the surrounding batch. The docs spell this out and explicitly warn against mid-session `commit_mmr`. Updated the seeding bench to call `commit_mmr` explicitly before dropping `ct`. The byte-for-byte equivalence, spend-usable anchor, and cost-accounting tests are unaffected (they compare in-memory frontier/state-root values that don't depend on commit ordering). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-batch `append_many_raw` was failing on the second call with "MMR get_root failed: Inconsistent store". Root cause is upstream grovedb behavior: `append_many_raw`'s internal MMR overlay accumulates compaction state across consecutive calls and is only safe to flush once at the end of the whole bake. Calling `ct.commit_mmr()` between batches corrupted the in-memory overlay so the next call read inconsistent state. Fixes: 1. **Bump grovedb pin** to `5eb7a5380a6e974513343352acfd6b30a8c1f87c` — picks up upstream commit `1340db71 fix(commitment-tree): don't commit_mmr inside append_many_raw`. 2. **Restructure the seeder loop** in `shielded.rs`: remove the per-batch `ct.commit_mmr()` introduced earlier, call it exactly once after the loop completes (immediately before `compute_commitment_tree_state_root`). Keep per-batch `ct.save()` — it persists the Sinsemilla frontier and is cheap (gives durable mid-bake checkpoints + keeps the per-batch root logs accurate). 3. **Bump the round-trip test** `snapshot_dump_apply_preserves_anchor` from N=5_000 (single-batch path) to N=30_000 (three batches at `BATCH_SIZE = 10_000`) so it actually pins the inter-batch MMR-overlay handoff that this fix targets.
…ntier API (#751) * feat(commitment-tree): frontier-less bulk seeding for sync/scale testing Add `test-seeding`-gated `append_raw_without_frontier` and `append_many_without_frontier` on `CommitmentTree`. These populate the underlying BulkAppendTree at blake3 speed WITHOUT updating the Sinsemilla frontier, so a devnet shielded pool can be pre-loaded with a large N of filler notes without paying the per-note Pallas/Sinsemilla hashing (or the full Drive insert path). A frontier-less seeded tree has no valid Orchard anchor (so seeded notes aren't spendable), but BulkAppendTree chunk proofs — authenticated by the blake3 bulk state root, not the frontier — still verify, which is exactly what client wallet sync exercises. Under `test-seeding`, `CommitmentTree::open` tolerates the empty-frontier/populated-bulk shape so seeded state round-trips; production behavior is unchanged when the feature is off. The grovedb crate forwards this via `commitment_tree_test_seeding`. For devnet/benchmark seeding only — never enable in production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(commitment-tree): reject frontier-less seeding on a non-empty frontier Guard both `append_raw_without_frontier` and `append_many_without_frontier`: advancing the bulk tree while the frontier already has leaves would leave `frontier_size < total_count`, a mismatch `open` cannot tolerate (it only accepts an empty frontier), producing unreopenable state. Reject it instead. Addresses CodeRabbit review on #751. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(commitment-tree): rename feature to test-seeding-ct; drop open() check under it Rename the seeding feature `test-seeding` -> `test-seeding-ct` (and the grovedb forwarding feature to match). Under the feature, drop the frontier/bulk consistency check in `CommitmentTree::open` entirely rather than tolerating only the empty-frontier case. This lets a frontier-less-seeded tree have real, frontier-tracked notes added on top via the normal `append` path and still reopen at any `(frontier_size, total_count)` pair. Production behavior is unchanged when the feature is off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(commitment-tree): cover frontier-less seeding error paths for patch coverage Add fault-injection to the test storage mock (toggleable get/put failures) and three tests covering the previously-uncovered error branches in the frontier-less seeding methods: the wrapped bulk-append storage error, per-entry error propagation out of the bulk loop, and the empty-input state-root recomputation failure. The remaining commit_mmr flush error is annotated codecov:ignore (unreachable via the seeding API, which always flushes in-call). Raises patch coverage above the 90% gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(bulk-append-tree): cache MMR root so append is O(1), not O(N) per call `BulkAppendTree::append` called `get_mmr_root()` on every non-compaction append, and `get_mmr_root()` clones the `mmr_overlay` — which accumulates the chunk leaf nodes (each carrying a ~573 KB blob) across all compaction cycles until the session-end flush. Cloning that growing, blob-bearing overlay on every append made bulk seeding O(N^2): a 1M frontier-less seed degraded from ~2400 notes/s to sub-1100 and never finished in reasonable time. The MMR is only mutated on compaction (every `epoch_size` appends), so its root is unchanged in between. Cache it (`last_mmr_root`), refreshing only on compaction. `from_state` keeps it lazy (`None`) because a restored MMR may not be readable until the first append; the first append computes it once, exactly as before. Bulk seeding is now linear at a constant ~2435 notes/s (1M in ~6.85 min vs. 30+ min / non-terminating before). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * bench(commitment-tree): add 1M frontier-less seeding throughput benchmark Standalone (harness=false) bench that seeds N random filler notes into a real RocksDB-backed CommitmentTree via append_many_without_frontier and reports wall-clock time, splitting compute from the disk commit. Defaults to 1M; override with SEED_N. Gated on test-seeding-ct. cargo bench -p grovedb-commitment-tree --bench seeding --features test-seeding-ct Measured 1M random notes in ~6.85 min (linear ~2435 notes/s) after the MMR-root caching fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(commitment-tree): batched append_many_raw — replace _without_frontier API Per-leaf `CommitmentTree::append_raw` was slow on bulk seeds not because of Sinsemilla-per-leaf (amortized ~1 hash via the carry chain) but because `CommitmentFrontier::append` called `self.root_hash()` after every insert, walking the depth-32 Sinsemilla path for ~32 hashes per leaf. For 1M leaves that's ~32M extra hashes. The upstream `incrementalmerkletree::Frontier` already separates append (cheap) from root (expensive); our wrapper conflated them. The earlier `_without_frontier` shape side-stepped this by skipping the frontier entirely, producing a Sinsemilla anchor that reflected only a handful of cmx — wallets reconstructing the tree locally got a root the chain had never recorded, and spend proofs failed. This commit fixes the actual problem. Added: * `CommitmentFrontier::append_no_root(cmx)` — upstream `Frontier::append`, no depth-32 walk; carry-chain cost only. * `CommitmentFrontier::root_hash_with_cost()` — pure accessor that attributes the deferred 32-hash depth walk for batched callers. * `BulkAppendTree::append_no_state_root(value)` — `append` minus the per-leaf `compute_state_root` blake3. `append` now delegates to it + one final `compute_current_state_root`. * `BulkAppendTree::append_many<I: IntoIterator<Item = Vec<u8>>>` — batched bulk-tree appends, one state-root computation at the end. * `CommitmentTree::append_many_raw<I: IntoIterator<Item = ([u8;32],[u8;32], Vec<u8>)>>` — byte-for-byte equivalent to N × `append_raw` (same dense-buffer, MMR, `CommitmentFrontier::serialize()`, final state and Sinsemilla roots) but the Sinsemilla anchor and bulk state root are each computed exactly once at the end. Flushes the MMR overlay before return. * `compute_current_state_root` now uses the `last_mmr_root` cache when populated (O(1) on the hot path). Removed: * `test-seeding-ct` Cargo feature in `grovedb-commitment-tree` and the forwarding feature in `grovedb`. * `append_raw_without_frontier`, `append_many_without_frontier`, `FrontierLessAppendResult`, `BulkSeedSummary`, and their lib.rs re-exports. * The `#[cfg(not(feature = "test-seeding-ct"))]` gating around the frontier/bulk consistency check in `CommitmentTree::open` — the check is now unconditional. * Fault-injection mock state and storage-fault tests that only existed to cover the deleted branches. Tests: * `append_many_raw_byte_for_byte_matches_per_leaf` — for N ∈ {0,1,2,3,100, 2048,10_000}, the per-leaf and batched paths produce identical `frontier.root_hash()`, `CommitmentFrontier::serialize()`, bulk state root, and `total_count`. * `append_many_raw_anchor_is_spend_usable` — independent Sinsemilla auth-path recomputation; verifies `orchard::MerklePath::from_parts(pos, path).root(cmx) == ct.anchor()`. * `append_no_root_cost_omits_per_leaf_depth_walk` — confirms the savings are exactly the per-leaf depth walks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(commitment-tree): add --features server to seeding bench example commands The bench is gated by required-features = ["server"], so the example invocations in the file-level docs need --features server or they run the fallback `main` and skip the actual benchmark. Addresses CodeRabbit review on #751. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(commitment-tree): don't commit_mmr inside append_many_raw GroveDB's `StorageContext::get` reads straight from the transaction and never consults its `StorageBatch` (see `storage/src/rocksdb_storage/storage_context/ context_tx.rs:296-310`). So as soon as `commit_mmr` drains the MMR overlay into the batch, the freshly-flushed peaks become invisible to reads until `commit_multi_context_batch` lands the batch in the tx — which only happens at the very end of a session. For chained `append_many_raw` calls on the same `CommitmentTree` (the 500k-note Drive seeder pattern in dashpay/platform#3732), the internal `commit_mmr` at the end of batch N drained the overlay; batch N+1's first compaction then read a peak via `MmrStore::element_at_position` → `ctx.get` → tx (peak not there) → `MMR::push` raised `InconsistentStore`. Make `commit_mmr` the caller's responsibility — same as `append_raw`, which also never flushes on its own. The overlay now stays alive across chained batches; the caller flushes it once, right before committing the surrounding batch. The docs spell this out and explicitly warn against mid-session `commit_mmr`. Updated the seeding bench to call `commit_mmr` explicitly before dropping `ct`. The byte-for-byte equivalence, spend-usable anchor, and cost-accounting tests are unaffected (they compare in-memory frontier/state-root values that don't depend on commit ordering). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(commitment-tree): cover append_many_raw + frontier error paths Address patch-coverage failures on PR #751: * Drop unused `BulkAppendTree::append_many` + `AppendManyResult`. The user spec said "or the equivalent" and `CommitmentTree::append_many_raw` goes through `append_no_state_root` in a loop — `append_many` had no real caller and was inflating the patch as dead code. * Re-add fault-injection toggles to the test storage mock and exercise: - `append_many_raw_rejects_invalid_cmx_mid_batch` - `append_many_raw_rejects_wrong_payload_size_mid_batch` - `append_many_raw_surfaces_bulk_storage_error` - `frontier_append_no_root_rejects_invalid_cmx` - `commit_mmr_success` (drives the bulk-tree `commit_mmr` flush path from the commitment-tree side, where `append_many_raw` purposely leaves the overlay in place per #751's chained-batch fix). * Annotate genuinely unreachable error branches with `codecov:ignore`: TreeFull on both `append` and `append_no_root` (requires 2^32 leaves), the frontier-append-error branch in `append_many_raw` (cmx already pre-validated upstream), and the end-of-batch state-root error branch (the dense-tree write-through cache + in-session MMR cache mean a fault-injecting mock can't trip it from inside `append_many_raw`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add coordinator tests for both halves of the clear() ordering fix: - success path empties the commitment tree and drops the account / persister registries - failure path (store reset forced to fail by removing the SQLite dir) returns Err and leaves the registries populated, so a failed clear can't silently turn future syncs into no-ops Regression guard for the partial-clear bug fixed in 115024f. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs (1)
880-905: 💤 Low valuePlatform-specific test behavior.
This test relies on Unix file handle semantics:
remove_dir_allunlinks the directory but the already-open SQLite handle continues working on the orphaned inode. Whenreset_commitment_treetries to reopen the path, it fails because the path no longer exists.On Windows,
remove_dir_allwould likely fail while files are open, causing this test to behave differently or fail.Consider adding a brief comment noting this is Unix-specific, or gating the test with
#[cfg(unix)]if Windows CI is expected.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs` around lines 880 - 905, The test clear_failure_preserves_registries relies on Unix-specific semantics where removing the directory (std::fs::remove_dir_all) unlinks the path while an open SQLite handle keeps working, causing reset_commitment_tree's subsequent Connection::open(path) to fail; update the test by either adding a clear comment stating it's Unix-specific and why, or gate the test with #[cfg(unix)] (or an appropriate platform check) so Windows CI doesn't run it, referencing the test name clear_failure_preserves_registries, the use of std::fs::remove_dir_all, reset_commitment_tree and coordinator.clear() so the change is applied in the correct place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs`:
- Around line 880-905: The test clear_failure_preserves_registries relies on
Unix-specific semantics where removing the directory (std::fs::remove_dir_all)
unlinks the path while an open SQLite handle keeps working, causing
reset_commitment_tree's subsequent Connection::open(path) to fail; update the
test by either adding a clear comment stating it's Unix-specific and why, or
gate the test with #[cfg(unix)] (or an appropriate platform check) so Windows CI
doesn't run it, referencing the test name clear_failure_preserves_registries,
the use of std::fs::remove_dir_all, reset_commitment_tree and
coordinator.clear() so the change is applied in the correct place.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 59abb91d-622e-42f6-8ddd-08d20f32c922
📒 Files selected for processing (1)
packages/rs-platform-wallet/src/wallet/shielded/coordinator.rs
The failure-injection in clear_failure_preserves_registries depends on POSIX unlink-while-open semantics (remove the dir, the open SQLite handle survives on the orphaned inode, but a fresh Connection::open at the missing path fails). Windows can't remove a directory with open files, so gate the test with #[cfg(unix)]; the success-path test is unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Latest delta (ef371a9..3fdc1b5) is a single test-gating change (#[cfg(unix)] + explanatory comment) on the shielded coordinator's clear-failure test — no production code touched. Carried-forward prior findings: none (the ef371a9 review had none, and the test-gate delta cannot resurrect anything). Cumulative re-review surfaced two would-be blocking findings from codex agents, but both fail verification: (1) the EventHandlerCallbacks ABI-growth concern is explicitly documented as an accepted in-tree-only limitation (event_handler.rs:18-35, committed in f9f5f42); (2) the clear() 'data loss' narrative misreads the doc — 'in-memory state' refers to accounts/persisters, and the best-effort cleanup with retry-on-error is the documented design, with the subwallet BTreeMap being volatile working state rather than authoritative persistence. Prior findings status: none to reconcile (STILL VALID/FIXED/OUTDATED/DEFERRED N/A).
Remove two working-notes markdown files that slipped into the branch: - docs/shielded-seeder-performance.md — stale (still cited 500k notes; superseded by the ~1M snapshot bake) and only a tracking doc - SwiftExampleApp/docs/shielded-sync-devnet-followups.md — a live-session punch list embedding devnet infrastructure IPs Scrub the now-dangling "see docs/shielded-seeder-performance.md" pointers from the dashmate buildArgs comments and the snapshot design doc; the inline "release required for SDK_TEST_DATA seeding" note already states the actionable fact. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Latest delta (3fdc1b5..0cfb3a5) is documentation-only cleanup. Cumulative re-review at head surfaces one valid user-visible regression flagged by codex-general in the dual-progress reporting: on resumed/multi-subwallet syncs the new tree-progress ('checked') counter can begin ahead of the SDK's 'downloaded' counter because their baselines diverge (pre-stream tree_size vs aligned_start), contradicting the PR's documented Checked ≤ Downloaded ≤ total invariant. No security, consensus, or FFI defects.
🔴 1 blocking
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/src/wallet/shielded/sync.rs`:
- [BLOCKING] packages/rs-platform-wallet/src/wallet/shielded/sync.rs:318-376: Resumed shielded sync reports `checked` ahead of `downloaded`, breaking the documented progress invariant
`leaves_committed` is seeded with the pre-stream `tree_size` (sync.rs:323) and emitted as the 'checked' value via `on_tree_progress(leaves_committed, total_target)` on every batch (sync.rs:374-376). The SDK's 'downloaded' callback, by contrast, reports `state.start_index + state.cumulative_scanned` (sync_shielded_notes.rs:312-318), where `state.start_index == aligned_start` and `aligned_start = (min(watermarks) / CHUNK_SIZE) * CHUNK_SIZE` (sync.rs:221) — i.e. the rewound MIN across subwallets, not the current shared `tree_size`.
Whenever `tree_size > aligned_start` — which is the normal case for a multi-subwallet bind (subwallet A is synced to tree_size, subwallet B binds with watermark 0, so aligned_start collapses to 0), or any resume where the shared tree has been advanced past this subwallet's chunk-aligned start by a prior pass — the very first emitted progress event has `checked = tree_size` while `downloaded = aligned_start + chunk` is much smaller. The gate `global_pos < tree_size` (sync.rs:355-357) keeps `leaves_committed` pinned at `tree_size` until the stream catches up, so the inequality `checked > downloaded` persists for most of the pass.
This directly violates the `Checked ≤ Downloaded ≤ total` invariant the PR advertises and the comment at sync_shielded_notes.rs:313-314 ("so the 'Downloaded' bar shares the 'Checked' baseline") explicitly relies on, and will render the dual progress bars incoherently on every non-cold sync. Fix by aligning the two baselines — e.g. start `leaves_committed` at `aligned_start` (or the wallet's resume position) and only count delta appends, or have the SDK report `downloaded` from `tree_size` for the gate-skipped region.
The "downloaded" progress value the SDK reports is aligned_start + scanned, where aligned_start is the rewound MIN watermark across subwallets and can sit far below the shared tree's leaf count (a second subwallet binding at watermark 0 collapses it to 0; a partial-chunk-tail resume rewinds below the tail). The "checked" value is the absolute tree_size, so over the gate-skipped re-scan region the unclamped download value read below checked and broke the advertised Checked <= Downloaded <= total invariant on every non-cold sync. Clamp the forwarded download value up to the pre-stream tree_size in the wallet consumer, which is the only place that knows both baselines. Everything already in the tree was downloaded on a prior pass, so counting it as downloaded is accurate; the clamp is a no-op past tree_size and for cold syncs (tree_size == 0). Update the SDK comment that wrongly claimed the raw value already shared the checked baseline. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Cumulative incremental review. Prior blocking finding (Checked > Downloaded on resumed multi-subwallet syncs) is FIXED at af24380 — the host-visible downloaded value is now clamped up to the pre-stream tree_size baseline before crossing into the host callback, restoring Checked ≤ Downloaded. One new in-scope suggestion in the latest delta: the Downloaded ≤ total half of the same advertised invariant can still be violated transiently because total_target is seeded at 0 and grown from batch.total_count.max(), while the numerators are floored at tree_size; a first batch proven by a lagging node with total_count < tree_size will read downloaded > total until a fresher chunk arrives. No new blockers; no new FFI, security, or consensus issues.
🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/src/wallet/shielded/sync.rs`:
- [SUGGESTION] packages/rs-platform-wallet/src/wallet/shielded/sync.rs:237-240: Seed `total_target` from `tree_size` so the denominator can't fall below the floored numerators on resumed syncs
The clamp added at sync.rs:276-280 fixes `Checked ≤ Downloaded`, but the `Downloaded ≤ total` half of the advertised invariant can still be violated on a resume. `total_target` starts at 0 here and is updated via `total_target = total_target.max(batch.total_count)` at line 361. Meanwhile both numerators are now floored at `tree_size`: `leaves_committed` is initialized to `tree_size` (line 341) and the forwarded `downloaded` is `downloaded.max(tree_baseline)` where `tree_baseline = tree_size` (lines 277-279).
The shielded note count is append-only, so a node that returns `batch.total_count < tree_size` is simply lagging behind the block height at which our prior sync committed `tree_size` leaves. Because the SDK explicitly allows different chunks to come back at different heights and only takes the max (sync_shielded_notes.rs:303-306), the first emitted callback in a resumed sync that races a slightly-behind node reports `checked = tree_size`, `downloaded = tree_size`, and `total = batch.total_count < tree_size` — both numerators briefly exceed the denominator until a fresher chunk arrives. Practical impact is a UI glitch (progress bar reads > 100%) rather than a correctness issue, but it directly contradicts the `Checked ≤ Downloaded ≤ total` contract the PR documents.
Seeding `total_target` from `tree_size` is safe because the on-chain count is monotonic and can never legitimately be below what we've already committed locally.
…<= total With both progress numerators floored at tree_size (the download clamp and leaves_committed's seed), a total_target that starts at 0 and grows via max(batch.total_count) can momentarily sit below the numerators when the first batch is proven by a lagging node returning total_count < tree_size, making the bar read > 100%. Seed total_target from tree_size: the on-chain shielded count is append-only so the real total can never be below what we have already committed locally. Cold sync (tree_size == 0) keeps the host's indeterminate handling until the first batch lands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Incremental review against prior commit af24380. The single carried-forward prior finding (sync.rs:237-240 progress-denominator seeded from 0) is FIXED at head 2b312ab: total_target is now seeded from tree_size at sync.rs:247, with an expanded comment explaining the rationale, and is still raised via .max(batch.total_count) at sync.rs:368. The latest delta introduces no new in-scope findings; the only candidate (Codex FFI ptr::read on growing vtables) is a pre-existing structural FFI pattern unrelated to this PR's stated goal.
…k_test_data The shielded-pool snapshot bake/apply module is pure test-data tooling: the bake reads a pool seeded by create_sdk_test_data (it dumps an empty subtree without that cfg) and the apply runs only from that seeder's fast-path. It has no purpose in a production drive-abci binary, yet the module and the `snapshot-bake` CLI command were always compiled in. Gate `pub mod shielded_snapshot` on `cfg(any(create_sdk_test_data, test))` (the `test` arm keeps the genesis seeder's test module compiling under `cargo test`), and gate the `SnapshotBake` command, its dispatch + config branch, the `snapshot_bake` fn, and the bake-only `NoopCoreRPC` stub on `cfg(create_sdk_test_data)`. The `grovedb-commitment-tree` dep stays (it is also used by production shield/unshield validation). Verified clean builds with and without the cfg, and under `cargo check --tests`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Reconciled the prior no-issue review at 2b312ab against current head 57af181: there are no carried-forward prior findings. The latest delta (gating shielded snapshot tooling behind create_sdk_test_data) is a clean, scope-correct refactor with no defects. Cumulative review of the current head surfaces one in-scope FFI lifecycle suggestion — platform_wallet_manager_destroy does not stop the new shielded sync thread, relying on the in-tree Swift wrapper's deinit ordering for soundness — and one minor doc-consistency nitpick. The codex coordinator clear() finding does not hold up against the documented contract and the regression test that was added with it.
🟡 1 suggestion(s)
1 additional finding(s) omitted (not in diff).
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet-ffi/src/manager.rs`:
- [SUGGESTION] packages/rs-platform-wallet-ffi/src/manager.rs:351-358: platform_wallet_manager_destroy does not stop shielded sync / identity sync / event adapter
`platform_wallet_manager_destroy` only calls `manager.platform_address_sync().stop()` and then drops the manager. The full lifecycle stop sequence lives in `PlatformWalletManager::shutdown` (manager/mod.rs:295), which also stops `identity_sync_manager`, `shielded_sync_manager`, and cancels/awaits the `event_adapter` task. The shielded sync background thread (added in this PR series) can therefore still be alive after destroy returns and can fire `on_shielded_sync_*` callbacks against an `EventHandlerCallbacks.context` pointer whose Swift owner has already been freed by a host that follows only the exported destroy contract. In practice the in-tree Swift wrapper mitigates this by stopping shielded sync in `PlatformWalletManager.deinit` before calling destroy, so the deployed consumer is safe — but the C ABI itself is unsound by convention. Calling `manager.shutdown().await` from destroy (block_on'd on the FFI runtime, mirroring the other entrypoints) closes the gap without changing the Swift call sequence.
platform_wallet_manager_destroy only stopped the platform-address sync manager, leaving the identity sync manager, the shielded sync manager, and the wallet-event adapter task alive after destroy returned. Any of those can fire a callback through the host-owned EventHandlerCallbacks.context pointer, which the host may free once destroy returns — an unsound C ABI contract even though the in-tree Swift wrapper mitigates it by stopping shielded sync in deinit first. Run the full idempotent PlatformWalletManager::shutdown() to completion on the FFI's multi-threaded runtime before dropping the manager, so no background task can outlive it. shutdown() is idempotent, so this does not conflict with the existing Swift deinit sequence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Re: the review-summary suggestion that (This finding had no inline thread since |
Ports the fix from #3733 into this branch. Replaces the bypassable boolean `suppressShieldedCompletionEvents` gate with a lock-guarded monotonic `ShieldedSyncGenerationCounter`: the FFI completion callback snapshots the generation on the callback thread before enqueueing onto the main actor, stop/clear bump the generation, and handleShieldedSyncCompleted drops any event whose snapshot no longer matches. This closes the same-actor-turn stop->restart race where a stale completion from a prior run leaked into the new run (a boolean re-opened by start before the queued task ran). Reconciled with this branch's dual-progress work: the resetCurrentShielded Progress() calls in handleShieldedSyncCompleted/stop/clear are preserved. Adds ShieldedSyncGenerationTests covering the guard. Supersedes #3733. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Cumulative incremental review of PR #3732 at head b635ffe vs prior review at 57af181. The only latest delta is the destroy-path change in packages/rs-platform-wallet-ffi/src/manager.rs that now calls runtime().block_on(manager.shutdown()) instead of stopping only platform-address sync. Prior finding classification: STILL VALID — the new call invokes the exact code the prior reviewer suggested, but PlatformWalletManager::shutdown() only calls cancel-only stop() on the three sync managers and joins just the event-adapter task. An in-flight identity-sync or shielded-sync pass that started before destroy() can keep running and fire persister callbacks (identity_sync.rs:581 calls self.persister.store() directly, bypassing the event adapter), so the same FFI use-after-free window the prior review called out is not actually closed. Codex's separate shielded coordinator clear() finding is a false positive (registries are preserved on failure per the documented contract). One carried-forward blocking finding; no new in-scope findings in the latest delta.
🔴 1 blocking
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet-ffi/src/manager.rs`:
- [BLOCKING] packages/rs-platform-wallet-ffi/src/manager.rs:351-366: [carried-forward, STILL VALID] manager_destroy still returns before in-flight sync passes have drained their FFI callbacks
The new `runtime().block_on(manager.shutdown())` matches the prior review's suggested patch verbatim, but `PlatformWalletManager::shutdown()` (packages/rs-platform-wallet/src/manager/mod.rs:295-307) does not provide a real quiescence barrier for the sync managers. `PlatformAddressSyncManager::stop()` (platform_address_sync.rs:226-236), `IdentitySyncManager::stop()` (identity_sync.rs:431-441), and `ShieldedSyncManager::stop()` (shielded_sync.rs:278-295, which explicitly documents itself as 'Cancel-only: this requests cancellation and returns immediately. A pass already inside `sync_now` / `coordinator.sync()` keeps running to completion (including its persister-callback fan-out)') only cancel the loop's `CancellationToken` and return. `shutdown()` then awaits only `event_adapter_join`, not the sync-manager background-task handles, so a pass that was already inside `sync_now()` when `stop()` was called can finish after `shutdown()` returns. The platform-address summary callback at platform_address_sync.rs:282-283 goes through `event_manager`, which is at least drained via the event-adapter join, but identity sync calls `self.persister.store(sentinel, cs.into())` directly at identity_sync.rs:581 (and the shielded coordinator persister fan-out is likewise direct), bypassing the adapter. Per the Swift side (PlatformWalletManager.deinit calls `platform_wallet_manager_destroy(handle)` and then releases the `eventHandler`/`persistenceHandler` objects, which are passed to Rust via `Unmanaged.passUnretained(...).toOpaque()`), once `destroy` returns the host can free the persister context while an in-flight identity-sync or shielded-sync pass is still about to fire `persister.store(...)` through that pointer — exactly the use-after-free hazard the prior review flagged. To truly close the window, `shutdown()` needs to wait for in-flight passes to drain (use `ShieldedSyncManager::quiesce()` for shielded, and add equivalent quiesce/await-handle barriers for the address and identity sync managers) before returning.
manager_destroy -> shutdown() previously only cancelled the sync managers' loops (cancel-only stop()) and awaited the event adapter. A pass already inside sync_now() kept running and could call persister.store(...) (identity sync) or fire on_platform_address_sync_completed / the shielded coordinator persister fan-out directly — bypassing the event adapter — after destroy returned and the Swift host freed the persister/event-handler context. That is a use-after-free of a host-owned context pointer. Add a quiescing-gate + quiesce() barrier to IdentitySyncManager and PlatformAddressSyncManager mirroring ShieldedSyncManager::quiesce (set gate, cancel, poll is_syncing until the in-flight pass fully drains incl. its fan-out, reopen gate), and a quiescing check in each sync_now after the is_syncing CAS. shutdown() now quiesce().await's all three managers before cancelling+joining the event adapter. Also move PlatformAddressSyncManager's is_syncing.store(false) to after the completion dispatch so the drain covers the host callback (matches the shielded ordering). Adds 5 tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Cumulative incremental review of head 7b36f29. The latest delta is Swift-only: it replaces a boolean suppression flag with a monotonic generation counter snapshotted on the FFI thread, correctly closing the same-actor-turn restart race for the shielded completion event. However, the prior b635ffe blocking finding — manager_destroy returns before in-flight identity/shielded sync passes drain their FFI callbacks — remains STILL VALID: PlatformWalletManager::shutdown() still calls cancel-only stop() on all three sync managers instead of awaiting quiesce(), so identity_sync's direct persister.store(...) and shielded coordinator's persister fan-out can fire callbacks through Unmanaged.passUnretained Swift contexts after destroy returns and ARC releases them. One new minor finding in the latest delta: the two shielded progress callbacks do not snapshot the generation counter like the completion callback does, so a stale main-actor progress Task can repopulate the UI mirrors after stopShieldedSync/clearShielded reset them.
Reviewed commit: 7b36f29
_Note: Inline posting failed (command failed (1): python3 scripts/review_poster.py dashpay/platform 3732 7b36f29
STDOUT:
STDERR:
Traceback (most recent call last):
File "/Users/claw/.openclaw/workspace/scripts/review_poster.py", line 754, in
result = post_review(repo, pr_number, h), so I posted the same verified findings as a top-level review body._
🔴 1 blocking | 🟡 1 suggestion(s)
Verified findings
blocking: [carried-forward, STILL VALID] shutdown() is cancel-only — manager_destroy can return while in-flight sync passes still hold Swift-owned callback contexts
packages/rs-platform-wallet/src/manager/mod.rs (line 295)
Re-validation of the b635ffe prior blocking finding against current head 7b36f29: STILL VALID. The latest delta is Swift-only (a generation counter for the shielded completion event) and does not change the Rust shutdown contract.
Current code at packages/rs-platform-wallet/src/manager/mod.rs:295-307:
pub async fn shutdown(&self) {
self.platform_address_sync_manager.stop();
self.identity_sync_manager.stop();
#[cfg(feature = "shielded")]
self.shielded_sync_manager.stop();
self.event_adapter_cancel.cancel();
if let Some(handle) = self.event_adapter_join.lock().await.take() { ... }
}All three stop() methods are explicitly cancel-only:
PlatformAddressSyncManager::stop()(platform_address_sync.rs:227-236) — takes the cancel token and returns.IdentitySyncManager::stop()(identity_sync.rs:432-441) — takes the cancel token and returns.ShieldedSyncManager::stop()(shielded_sync.rs:278-295) — docstring explicitly says**Cancel-only**: this requests cancellation and returns immediately. A pass already inside sync_now / coordinator.sync() keeps running to completion (including its persister-callback fan-out).
shutdown() awaits only event_adapter_join; it never awaits any sync-manager JoinHandle and never spins on is_syncing. ShieldedSyncManager::quiesce() (shielded_sync.rs:314-321) — the real barrier — exists and is correctly used by platform_wallet_manager_shielded_sync_stop / _clear (rs-platform-wallet-ffi/src/shielded_sync.rs:86-95) and by clear_shielded (mod.rs:280-286), but is NOT used here. Identity sync and platform-address sync have no quiesce equivalent at all, even though IdentitySyncManager::sync_now calls self.persister.store(...) directly (identity_sync.rs:575-587), bypassing the wallet-event adapter that shutdown does drain.
platform_wallet_manager_destroy (packages/rs-platform-wallet-ffi/src/manager.rs:351-366) runs runtime().block_on(manager.shutdown()) and its own doc-comment claims 'no task may be left alive to fire a callback against freed memory' — but shutdown() does not enforce that for identity/shielded passes already past their cancel check.
Swift PlatformWalletManager.deinit (packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManager.swift:151-157) calls platform_wallet_manager_destroy(handle) and then ARC releases the eventHandler / persistenceHandler objects whose pointers were passed across the FFI via Unmanaged.passUnretained(...).toOpaque() (e.g. PlatformWalletPersistenceHandler.swift:924-927, PlatformWalletManagerAddressSync.swift:40-50). Once destroy returns and ARC frees those Swift objects, an in-flight identity-sync or shielded-sync pass can dereference the now-dangling context: *mut c_void on its next persistence callback — a cross-language use-after-free at the FFI boundary, not just a stale-UI issue.
The new Swift generation gate does not address this: it only filters the @MainActor-side completion hop after the FFI callback has already executed. The unsafe access (Unmanaged.fromOpaque(context).takeUnretainedValue() and the identity-sync direct persister fan-out which never goes through the Swift event handler at all) happens earlier on the FFI thread, before any @mainactor task would run.
Fix shape: make shutdown() a real quiescence barrier. Add quiesce() to IdentitySyncManager and PlatformAddressSyncManager mirroring ShieldedSyncManager::quiesce (raise a quiescing gate, cancel, await is_syncing falling), then have shutdown() await quiesce() on all three before tearing down the event adapter. After that, manager_destroy returning genuinely means 'no Rust task will fire another callback through the host context pointer', which is the invariant the FFI's own comment block claims to provide.
suggestion: Shielded progress and tree-progress callbacks lack the generation guard the completion callback now has
packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManagerShieldedSync.swift (line 653)
The latest delta added a monotonic generation snapshot to shieldedSyncCompletedCallback (line 646) so a trailing main-actor hop after stopShieldedSync / clearShielded is dropped. The two progress callbacks immediately below — shieldedSyncProgressCallback (656-672) and shieldedTreeProgressCallback (680-696) — still just do Task { @MainActor [weak manager] in manager?.handleShieldedSyncProgress(...) } with no generation snapshot and no guard.
Rust-side platform_wallet_manager_shielded_sync_stop correctly uses quiesce() (rs-platform-wallet-ffi/src/shielded_sync.rs:86-95), so when stopShieldedSync() returns, no further Rust-side dispatches will happen. But a progress callback that fired during the quiesce window enqueues a Task onto the main actor; that task can land AFTER stopShieldedSync() calls resetCurrentShieldedProgress() (line 247) or clearShielded() does the same (line 279), re-publishing stale currentShielded* values from the just-stopped pass and undoing the reset on the UI.
This is the same same-turn race the completion fix was written to address, just on a different surface. Apply the same pattern: snapshot handler.manager?.shieldedSyncGeneration.current() in the FFI trampoline before enqueueing, then have the @mainactor handler drop if the snapshot mismatches the current generation.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
- [BLOCKING] In `packages/rs-platform-wallet/src/manager/mod.rs`:295-307: [carried-forward, STILL VALID] shutdown() is cancel-only — manager_destroy can return while in-flight sync passes still hold Swift-owned callback contexts
Re-validation of the b635ffe8 prior blocking finding against current head 7b36f29bb0: STILL VALID. The latest delta is Swift-only (a generation counter for the shielded completion event) and does not change the Rust shutdown contract.
Current code at packages/rs-platform-wallet/src/manager/mod.rs:295-307:
```rust
pub async fn shutdown(&self) {
self.platform_address_sync_manager.stop();
self.identity_sync_manager.stop();
#[cfg(feature = "shielded")]
self.shielded_sync_manager.stop();
self.event_adapter_cancel.cancel();
if let Some(handle) = self.event_adapter_join.lock().await.take() { ... }
}
All three stop() methods are explicitly cancel-only:
PlatformAddressSyncManager::stop()(platform_address_sync.rs:227-236) — takes the cancel token and returns.IdentitySyncManager::stop()(identity_sync.rs:432-441) — takes the cancel token and returns.ShieldedSyncManager::stop()(shielded_sync.rs:278-295) — docstring explicitly says**Cancel-only**: this requests cancellation and returns immediately. A pass already inside sync_now / coordinator.sync() keeps running to completion (including its persister-callback fan-out).
shutdown() awaits only event_adapter_join; it never awaits any sync-manager JoinHandle and never spins on is_syncing. ShieldedSyncManager::quiesce() (shielded_sync.rs:314-321) — the real barrier — exists and is correctly used by platform_wallet_manager_shielded_sync_stop / _clear (rs-platform-wallet-ffi/src/shielded_sync.rs:86-95) and by clear_shielded (mod.rs:280-286), but is NOT used here. Identity sync and platform-address sync have no quiesce equivalent at all, even though IdentitySyncManager::sync_now calls self.persister.store(...) directly (identity_sync.rs:575-587), bypassing the wallet-event adapter that shutdown does drain.
platform_wallet_manager_destroy (packages/rs-platform-wallet-ffi/src/manager.rs:351-366) runs runtime().block_on(manager.shutdown()) and its own doc-comment claims 'no task may be left alive to fire a callback against freed memory' — but shutdown() does not enforce that for identity/shielded passes already past their cancel check.
Swift PlatformWalletManager.deinit (packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManager.swift:151-157) calls platform_wallet_manager_destroy(handle) and then ARC releases the eventHandler / persistenceHandler objects whose pointers were passed across the FFI via Unmanaged.passUnretained(...).toOpaque() (e.g. PlatformWalletPersistenceHandler.swift:924-927, PlatformWalletManagerAddressSync.swift:40-50). Once destroy returns and ARC frees those Swift objects, an in-flight identity-sync or shielded-sync pass can dereference the now-dangling context: *mut c_void on its next persistence callback — a cross-language use-after-free at the FFI boundary, not just a stale-UI issue.
The new Swift generation gate does not address this: it only filters the @MainActor-side completion hop after the FFI callback has already executed. The unsafe access (Unmanaged.fromOpaque(context).takeUnretainedValue() and the identity-sync direct persister fan-out which never goes through the Swift event handler at all) happens earlier on the FFI thread, before any @mainactor task would run.
Fix shape: make shutdown() a real quiescence barrier. Add quiesce() to IdentitySyncManager and PlatformAddressSyncManager mirroring ShieldedSyncManager::quiesce (raise a quiescing gate, cancel, await is_syncing falling), then have shutdown() await quiesce() on all three before tearing down the event adapter. After that, manager_destroy returning genuinely means 'no Rust task will fire another callback through the host context pointer', which is the invariant the FFI's own comment block claims to provide.
- [SUGGESTION] In
packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManagerShieldedSync.swift:653-697: Shielded progress and tree-progress callbacks lack the generation guard the completion callback now has
The latest delta added a monotonic generation snapshot toshieldedSyncCompletedCallback(line 646) so a trailing main-actor hop afterstopShieldedSync/clearShieldedis dropped. The two progress callbacks immediately below —shieldedSyncProgressCallback(656-672) andshieldedTreeProgressCallback(680-696) — still just doTask { @MainActor [weak manager] in manager?.handleShieldedSyncProgress(...) }with no generation snapshot and no guard.
Rust-side platform_wallet_manager_shielded_sync_stop correctly uses quiesce() (rs-platform-wallet-ffi/src/shielded_sync.rs:86-95), so when stopShieldedSync() returns, no further Rust-side dispatches will happen. But a progress callback that fired during the quiesce window enqueues a Task onto the main actor; that task can land AFTER stopShieldedSync() calls resetCurrentShieldedProgress() (line 247) or clearShielded() does the same (line 279), re-publishing stale currentShielded* values from the just-stopped pass and undoing the reset on the UI.
This is the same same-turn race the completion fix was written to address, just on a different surface. Apply the same pattern: snapshot handler.manager?.shieldedSyncGeneration.current() in the FFI trampoline before enqueueing, then have the @mainactor handler drop if the snapshot mismatches the current generation.
</details>
The shielded completion callback snapshots the sync generation on the FFI thread and drops stale main-actor hops, but the two progress callbacks (shieldedSyncProgressCallback / shieldedTreeProgressCallback) did not. A progress Task enqueued just before stopShieldedSync()/clearShielded() could land after resetCurrentShieldedProgress() and re-publish phantom currentShielded* values over the just-reset UI mirrors. Apply the same pattern: snapshot shieldedSyncGeneration in both trampolines before enqueueing and pass it to handleShieldedSyncProgress / handleShieldedTreeProgress, which now guard on it. Adds 4 tests covering stale-dropped / current-published for both progress surfaces. (No memory safety impact — the Rust side already quiesces on stop/clear; this is the UI-coherence half of the same race.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Re the review of 7b36f29 (both findings landed in the review body since inline posting failed — noting disposition here): Blocking — Suggestion — progress callbacks lack the generation guard: fixed in |
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The most recent commit (0cfe6c8) cleanly fixes the prior blocking FFI use-after-free on shutdown by adding quiesce() barriers to IdentitySyncManager and PlatformAddressSyncManager and wiring them into PlatformWalletManager::shutdown ahead of the event-adapter join. Two in-scope suggestions remain: the two shielded progress trampolines on the Swift side still lack the generation snapshot that this PR's earlier commit added to the completion trampoline (asymmetry introduced by this PR), and the new shielded-snapshot header is not injective over the parent leaf's flags. No in-scope blocking issues.
🟡 2 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManagerShieldedSync.swift`:
- [SUGGESTION] packages/swift-sdk/Sources/SwiftDashSDK/PlatformWallet/PlatformWalletManagerShieldedSync.swift:656-697: Shielded progress callbacks lack the generation guard the completion callback now has
`shieldedSyncCompletedCallback` (line 638-650) snapshots `handler.manager?.shieldedSyncGeneration.current()` on the FFI thread before the @MainActor hop, and the @MainActor handler drops the event if the snapshot no longer matches — the same-actor-turn restart fix added by commit 7b36f29bb0. The two sibling trampolines `shieldedSyncProgressCallback` (656-672) and `shieldedTreeProgressCallback` (680-696) still do a bare `Task { @MainActor [weak manager] in manager?.handleShieldedSync...(...) }` with no generation snapshot, and `handleShieldedSyncProgress` / `handleShieldedTreeProgress` write to `currentShielded*` mirrors with no generation check either.
The new Rust-side `ShieldedSyncManager::quiesce()` drains in-flight passes before `platform_wallet_manager_shielded_sync_stop` / `_clear` return, so this is no longer an FFI UAF — but a progress callback that fired and enqueued its main-actor Task before quiesce observed `is_syncing == false` can still land after `resetCurrentShieldedProgress()` runs (around lines 247 / 279), republishing stale `currentShielded*` values from the just-stopped pass and undoing the UI reset. This is the same boundary-crossing race the completion fix was written to address, on a different surface.
Apply the same snapshot+guard pattern symmetrically: take `handler.manager?.shieldedSyncGeneration.current()` on the FFI thread before enqueueing the Task, pass it through to the @MainActor handler, and drop when the snapshot mismatches the current generation.
In `packages/rs-drive-abci/src/shielded_snapshot/mod.rs`:
- [SUGGESTION] packages/rs-drive-abci/src/shielded_snapshot/mod.rs:375-385: Snapshot header flags_byte is not injective over parent-leaf flags
On dump (lines 375-385) the encoding maps `None`, `Some(vec![])`, and `Some(vec![0])` all to `flags_byte = 0`; on apply (lines 534-538) `0` decodes back to `None`. That loses information from the parent `Element::CommitmentTree` leaf's `Option<Vec<u8>>` flags during serialization, so `apply_shielded_snapshot()` cannot always reconstruct the original parent leaf bytes. If the source tree ever carries a one-byte zero flags vector or an empty flags vector, the snapshot-applied genesis will write a different parent leaf than the live-seeded tree, changing the parent Merk serialization even though the subtree checksum and `combined_root` cross-validation still pass — `combined_root` covers the rebuilt CommitmentTree, not the parent-leaf flags bytes.
Impact in practice is bounded because (a) this tooling is now gated behind `create_sdk_test_data` and (b) the parent leaf's flags are realistically `None` or a single non-zero byte today, but the snapshot's stated contract is a faithful round-trip of the seeded pool. An explicit presence/length discriminator (e.g. a separate `flags_present: u8` plus the byte, or sentinel-encoded as `0xFF` for `None`) avoids overloading `0`. The existing `Inconsistent` error on `flags.len() > 1` already shows the format is intentionally narrow; this fix tightens the same narrow path without widening it.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Incremental review at be33d41. The only delta vs 0cfe6c8 is Swift-only: both shielded progress trampolines (shieldedSyncProgressCallback, shieldedTreeProgressCallback) now snapshot shieldedSyncGeneration.current() on the FFI thread before the @mainactor hop and the handlers gate on the snapshot, mirroring the completion-trampoline pattern. Four new XCTest cases cover stale-drop and current-publish on both surfaces. The prior progress-callback finding is RESOLVED. One carried-forward suggestion on the shielded snapshot header's non-injective flags_byte encoding remains valid but is bounded to create_sdk_test_data-gated tooling. No new in-scope defects in the delta.
🟡 1 suggestion(s)
1 additional finding(s)
suggestion: Snapshot header `flags_byte` is not injective over parent-leaf flags
packages/rs-drive-abci/src/shielded_snapshot/mod.rs (line 375)
Verified at be33d41. The dump-side encoder at lines 375-385 collapses three distinct parent Element::CommitmentTree flag states — None, Some(vec![]), and Some(vec![0]) — all to flags_byte = 0. The apply-side decoder at lines 534-538 unconditionally maps 0 back to None, then feeds that into replace_commitment_tree_subtree_root at 540-554. The combined_root cross-check at line 523 only covers the rebuilt CommitmentTree subtree contents, not the parent-leaf flags bytes, so the existing validation does not catch a parent-Merk divergence introduced by this lossy round-trip.
Concrete impact: if a source pool ever carries Some(vec![]) or Some(vec![0]) for the parent leaf flags, a snapshot-booted devnet will write different parent-leaf bytes than a live-seeded one starting from the same subtree. That changes the parent Merk serialization (and therefore the platform state root) silently, despite every downstream cross-check passing. The blast radius is bounded today because (a) this tooling is gated behind cfg(create_sdk_test_data) and (b) the realistic parent-leaf flags today are None or a single non-zero byte — hence suggestion rather than blocking. But the snapshot's stated contract is a faithful round-trip of the seeded pool, and the existing Inconsistent error on flags.len() > 1 already shows the format is intentionally narrow.
A localized fix that preserves header width: bump FORMAT_VERSION and use a presence/length discriminator — e.g. sentinel-encode None as 0xFF (rejecting 0xFF as a valid flags byte), 0x00 for Some(vec![]), and 0x01..=0xFE (or a separate flags_present: u8 plus the byte) for Some([byte]). This eliminates the 0 overload before any code path can synthesize the lossy states for that leaf.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
- [SUGGESTION] In `packages/rs-drive-abci/src/shielded_snapshot/mod.rs`:375-385: Snapshot header `flags_byte` is not injective over parent-leaf flags
Verified at be33d412. The dump-side encoder at lines 375-385 collapses three distinct parent `Element::CommitmentTree` flag states — `None`, `Some(vec![])`, and `Some(vec![0])` — all to `flags_byte = 0`. The apply-side decoder at lines 534-538 unconditionally maps `0` back to `None`, then feeds that into `replace_commitment_tree_subtree_root` at 540-554. The `combined_root` cross-check at line 523 only covers the rebuilt CommitmentTree subtree contents, not the parent-leaf flags bytes, so the existing validation does not catch a parent-Merk divergence introduced by this lossy round-trip.
Concrete impact: if a source pool ever carries `Some(vec![])` or `Some(vec![0])` for the parent leaf flags, a snapshot-booted devnet will write different parent-leaf bytes than a live-seeded one starting from the same subtree. That changes the parent Merk serialization (and therefore the platform state root) silently, despite every downstream cross-check passing. The blast radius is bounded today because (a) this tooling is gated behind `cfg(create_sdk_test_data)` and (b) the realistic parent-leaf flags today are `None` or a single non-zero byte — hence suggestion rather than blocking. But the snapshot's stated contract is a faithful round-trip of the seeded pool, and the existing `Inconsistent` error on `flags.len() > 1` already shows the format is intentionally narrow.
A localized fix that preserves header width: bump `FORMAT_VERSION` and use a presence/length discriminator — e.g. sentinel-encode `None` as `0xFF` (rejecting `0xFF` as a valid flags byte), `0x00` for `Some(vec![])`, and `0x01..=0xFE` (or a separate `flags_present: u8` plus the byte) for `Some([byte])`. This eliminates the `0` overload before any code path can synthesize the lossy states for that leaf.
Top-level exact-SHA review posted because inline posting deduped the carried-forward open thread from the previous SHA; this records coverage for be33d41 without adding a duplicate inline thread.
…jective The snapshot header encoded the parent Element::CommitmentTree flags as a single byte, collapsing None, Some(vec![]), and Some(vec![0]) all to 0, and apply decoded 0 back to None. So a parent leaf carrying Some(vec![0]) would round-trip to None, writing different parent-Merk bytes (and platform state root) than a live-seeded tree — undetected, since combined_root covers only the rebuilt subtree, not the parent-leaf flags. Encode flags as Option<u8> with a [present, byte] discriminator (None and Some(0) now distinct on the wire), reject empty/multi-byte flags vecs loudly at dump, and bump FORMAT_VERSION to 2. Snapshots are rebaked per image build so no v1 compatibility is needed. Adds header round-trip + bad-discriminator unit tests. Bounded impact (create_sdk_test_data-gated tooling) but restores the snapshot's faithful-round-trip contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Cumulative incremental review at head b5db278. Prior be33d412 shielded_snapshot parent-leaf flags injectivity finding is FIXED: header now encodes Option<u8> as an injective [present, byte] pair, FORMAT_VERSION is bumped 1→2 (so v1 snapshots are rejected by FormatVersionMismatch rather than silently misread), ENCODED_LEN is 63, the dumper rejects empty/multi-byte parent-leaf flag vecs with ShieldedSnapshotError::Inconsistent, the decoder rejects invalid presence discriminators, and round-trip + bad-discriminator tests pin both paths. The latest delta is a single surgical commit; no new in-scope defects identified. Codex-general's dashmate buildArgs finding describes the documented intentional contract (the source comment at generateEnvsFactory.js:118-120 explicitly states drive-abci wins on collision because the namespace is shared by design), so it's dropped as not-a-defect. Codex-rust-quality's unwrap() finding is overstated as blocking: those calls live almost entirely in the offline dump_shielded_subtree bake CLI plus one InitChain site where startup panic is operationally equivalent to a typed error — preserved only as an internal follow-up.
Issue being fixed or feature implemented
Continues the work from #3714 / #3769: make it possible to exercise wallet shielded-note sync at scale (≈1,000,000 notes) on a local devnet, and make that sync both fast and observable.
When
drive-abciis built with--cfg create_sdk_test_data, the Orchard shielded pool is pre-populated at genesis with ≈1,000,000 notes (a handful owned across two deterministic ZIP-32 test wallets). Standing that data up surfaced two further needs that this PR also addresses: cold sync of a million notes had to be made fast enough to iterate on, and the long-running sync needed real progress feedback in the wallet UI.What was done?
Genesis seeding of the shielded pool (
drive-abci,dashmate,Dockerfile,scripts/)InitChainpre-populates the Orchard pool undercreate_sdk_test_data. Filler notes (random-validcmx+ opaque ciphertext) plus a few real owned notes encrypted to fixed ZIP-32 test wallets, all driven by a single seeded RNG for byte-identical re-runs.append_many_rawwith a single end-of-bakecommit_mmr, batched for fast bake at N=1M.snapshot-bakeCLI dumps the seeded shielded subtree to a file at image-build time;InitChainapplies it via theDRIVE_SHIELDED_SNAPSHOTenv var (fast genesis), with a runtime fallback to live seeding when the snapshot is absent.create_sdk_test_dataon Devnet, not just Regtest.buildArgsmap on the docker-build schema (with a config migration to backfill it);setup_local_network.shsetsSDK_TEST_DATA=trueandCARGO_BUILD_PROFILE=release; env generation forwards build args into compose.Faster note sync (
platform-wallet,rs-sdk)max_query_chunks, v1 = 4) — up to ~8192 notes per request, ~122 requests for 1M notes.WAL+synchronous=NORMAL+temp_store=MEMORYPRAGMAs on the file-backed commitment-tree store, removing the per-cmxfsync bottleneck.max_concurrent = 16chunks concurrently across the network's nodes.Observable sync — dual progress bars (
platform-wallet,platform-wallet-ffi,rs-sdk,swift-sdk)CommitmentTreeparent element is already in the proof), so no extra round-trip is needed for the progress denominator.EventHandlerCallbacksslots plumb both signals to the host; the Swift example app renders a dualProgressViewand preserves cold-sync timing (elapsed / last / longest).Devnet wiring (
swift-sdk,rs-sdk-ffi)Correctness fixes (
platform-wallet)clear()is now fallible and all-or-nothing: it resets the store first and returns early on failure, only dropping the in-memory account/persister registries once both store resets succeed — so a failed clear can't silently turn future syncs into no-ops while the host keeps its state. The failure propagates through the FFI so the host doesn't wipe its own state on a Rust-side error.Proof verification (
drive,rs-drive-proof-verifier)verify_shielded_encrypted_notesreturns the on-chaintotal_countextracted from the same proof, with an explicit root-hash equality check between the note sub-proof and the count sub-proof.How Has This Been Tested?
drive-abcishielded genesis tests:cargo test -p drive-abci --lib create_genesis_state::test::shielded(with--cfg create_sdk_test_data) — wallet derivation, note generator, determinism, per-wallet decrypt, cross-wallet privacy, aggregate balance, and Drive integration.driveproof-verifier tests:cargo test -p drive --lib verify::shielded::verify_shielded_encrypted_notes—total_countextraction across full and subset verification modes.platform-walletcoordinatorclear()tests: success path (tree emptied + registries dropped) and the failure-path regression guard (forced store-reset failure leaves registries populated and returnsErr).cargo fmt --allandcargo clippyclean across the touched crates.SDK_TEST_DATA-seeded devnet recovers the expected per-wallet balances, with the dual progress bars climbing coherently (Checked ≤ Downloaded ≤ total) and "Clear" triggering a true cold rebuild from 0.examples/binaries and--ignoredtests under theshieldedfeature rather than in the default CI path.Breaking Changes
No consensus-breaking changes. Some non-consensus SDK/FFI surface changed (the note-fetch result now carries
total_count; new shielded progress callbacks onEventHandlerCallbacks;coordinator.clear()is now fallible). These are pre-release internal/host-facing API changes and do not warrant a!per the repo's breaking-change convention.Checklist:
For repository code-owners and collaborators only
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Chores