Fix stack overflow in trie sibling traversal#11
Conversation
There was a problem hiding this comment.
Build & Tests
CI: Pending (both macOS and Ubuntu). Cannot verify build/test results yet.
Local build: No Carp compiler on Pi; relying on CI.
Findings
1. sibling-find — correct. The iterative version uses a single-element stack to walk the singly linked sibling list. Each iteration pops a node, checks its label, and either returns it or pushes its sibling. Semantically equivalent to the original recursion. The stack always has 0 or 1 elements, consistent with the subtree-has-entry? pattern already used in this codebase.
2. sibling-upsert — correct. Uses the spine-collect-and-rebuild pattern:
- Forward pass: walks siblings, collecting into
spine, until the matching label is found (or siblings are exhausted). - When found: creates replacement node preserving the matched node's remaining sibling chain (
@(%node-sibling head-ref)) — correct. - When not found: sets
tailto the new child — appending at end, matching original recursive behavior. - Reverse pass: rebuilds from
spine[len-1]down tospine[0], threadingtailthrough — correct functional update of the singly linked list. - Return:
(Pair.init tail found)— the rebuilt chain plus whether it was a replacement. Matches original semantics.
3. sibling-remove — correct. Same spine-and-rebuild pattern:
- When found:
tailis set to the matched node's sibling, effectively unlinking it. Rebuild produces the chain without the removed node. - When not found: returns
@siblings-refdirectly without rebuilding — an optimization over the original (which would rebuild then discard at each level via thenot @(Pair.b &rec)check). This is correct and slightly more efficient.
4. get-in-siblings — correct, and the most interesting conversion. This function originally had two recursive calls: one for sibling traversal (same level) and one for child descent (deeper level). The iterative version handles both with cur-part/cur-index mutation:
- Sibling miss → push next sibling onto stack (same part/index)
- Sibling hit, key exhausted → result found, done
- Sibling hit, key continues → push child onto stack, advance part/index
The stack always has 0 or 1 elements because the function either descends or moves laterally, never both. When descending, cur-part and cur-index are mutated to track the new position in the key — this is safe because once you descend, the old part/index are no longer needed.
5. Formatting changes. The diff is large (~750 lines) but the vast majority is carp-fmt reformatting in the HAMT section (line wrapping, alignment). I verified the structural changes: the closing-paren consolidations (standalone ) merged into preceding lines) preserve the same total paren count. No semantic changes outside the four sibling functions and the test file.
6. Ref-counting overhead. The iterative versions create extra @ copies of Rc pointers (pushing onto stack, copying from spine). Since these are Rc-wrapped, copies are just refcount increments — cheap. The dropped copies decrement when the stack/spine are cleaned up. Marginally more overhead than the recursive version for non-overflow cases, but necessary for the overflow fix.
7. New test is good. build-wide-trie inserts 10k single-key entries (10k siblings at one level), then verifies get/remove operations. This directly exercises the fixed code path. The existing 100k-deep key test now works instead of crashing.
8. No missed edge case. I looked for:
- Empty sibling list → handled (
Maybe.Nothingsetsdone = trueor skips) - Single sibling → works (one iteration)
- Key not present →
get-in-siblingsreturnsNothing,sibling-removereturns unchanged - Zero-length key → handled by
get-in-siblings's>= next-i (Array.length key-ref)check
Verdict: revise
The code is correct and well-tested. However, CI is still pending — both macOS and Ubuntu checks have not completed. Given the extensive formatting changes alongside functional changes, CI confirmation is important before merging. Once CI is green, this is ready for merge.
|
We need to park this since I’m rewriting the library right now. After the rewrite goes live (depends on carp-lang/Carp#1570), this can be rewritten and integrated. |
|
Rewrite is live, you can rebase if it’s still relevant. |
sibling-find, sibling-upsert, sibling-remove and get-in-siblings walked the trie's singly linked sibling chain — and, in get-in-siblings, descended through child links — via mutual recursion, overflowing the C stack on wide (many siblings at one node) or deep tries. Convert all four to explicit stack/spine-based iteration, mirroring the existing subtree-has-entry? pattern already used in this module. sibling-upsert/sibling-remove collect a spine and rebuild the chain in reverse (functional update of a singly linked list); get-in-siblings now handles both sibling traversal and child descent iteratively. Add a wide-trie test (10k single-part keys → 10k siblings at one level) exercising get/insert/remove on the fixed paths. Re-derives the original claude/iterative-sibling-traversal fix against the post-rewrite generic Rc-based Persistent.TrieNode.
2c80d80 to
1e809a3
Compare
|
Rebased onto The four sibling-chain functions were still mutually recursive after the rewrite, so the stack overflow this PR fixes still reproduced. I re-derived the iterative versions against the new
The algorithm is unchanged from the version you reviewed — only the old On verification: the Pi runner's Carp binary predates carp#1570, so it can't compile the rewritten library ( Scope is unchanged from the original PR: this converts the four sibling-chain functions. |
There was a problem hiding this comment.
Build & Tests
CI: Green on both lanes now (test (ubuntu-latest), test (macos-latest)), which run the full scripts/validate.sh matrix — sanitized O1+ASan, release O2, and optimizer-stress O3 — across all 10 data structures including the trie suite. This resolves the sole blocker from the prior review round.
Local build: I attempted it to confirm the "can't build on the Pi" claim rather than take it on faith. carp --check test/persistent_trie.carp fails at the rc dependency with rc.carp: Can't find symbol 'recursive', cascading into the expected couldn't instantiate the generic type (Rc a) / undefined-TrieNode.init errors across the rewritten library. This Pi's Carp predates carp-lang/Carp#1570 (which introduced the recursive form the Rc-based rewrite relies on), so a local build is genuinely impossible — CI on current Carp is the authoritative validation, and it is green.
Prior feedback
The previous review traced all four functions and found them correct, returning revise only because CI was pending. CI has since completed green on both OSes, so that blocker is cleared. No code changes were requested and none were needed; the force-push to 1e809a3 is the rebased version now under CI.
Findings
I re-verified the four conversions independently against the new Rc-based Persistent.TrieNode:
sibling-find/sibling-upsert/sibling-remove— iterative walk over a 0/1-element stack.upsert's spine-collect-then-reverse-rebuild andremove's unlink-and-rebuild both preserve singly-linked sibling order and the append-at-end / replace-in-place / not-found semantics. Checked append-at-end, replace-first, replace-middle, remove-first/middle/last, and not-found (returns@siblings-refunchanged) — all match the original recursion.get-in-siblings— correctly folds the two original recursive calls (lateral sibling hop + child descent) into one loop withcur-part/cur-indextracking; the>= next-i (Array.length key-ref)terminal check and the child-Nothing→ not-found path match the recursion. The 0/1-element stack invariant holds (each iteration pops one and pushes at most one).- New 10k-wide-trie test genuinely stresses
sibling-find/upsert/removeandget-in-siblingsat 10k siblings on a single level (10k single-part keys → get/insert/remove), and the existing 100k-deep test exercises the iterativeget-in-siblingsdescent plusbuild-branch. Both are in the CI-run trie suite.
Scope note (not a defect): insert-node/remove-node still recurse on key depth. The 100k-deep insert avoids overflow only because a fresh deep branch is built iteratively via build-branch, and the deep test does insert+get but not a deep remove — a deep remove could still recurse. That's unchanged from before and explicitly out of scope for this PR; worth a follow-up someday, not a blocker here.
No correctness issues found.
Verdict: merge
The four conversions are correct, the only prior blocker (pending CI) is now resolved with all lanes green, and the local-build failure is a compiler-version constraint rather than a problem with this PR.
Summary
sibling-find,sibling-upsert,sibling-remove,get-in-siblings) to iterative loops using stack-based traversalsibling-findandget-in-siblingsuse a single-element Array as a stack to walk the linked listsibling-upsertandsibling-removeadditionally collect a spine Array and rebuild in reverse (standard functional-update pattern for singly linked lists)get-in-siblingsnow handles both sibling traversal AND child descent iteratively, fixing stack overflow on deep tries (e.g. 100k-part keys)subtree-has-entry?in this codebaseWhat was broken
The trie's sibling chain is a singly linked list traversed by mutual recursion. With many siblings at one node (wide trie) or deep key descent through
get-in-siblings, the C stack overflows. The existing test with a 100k-part key already demonstrated this — it crashed withAddressSanitizer: stack-overflowinget-in-siblings.Test plan
carp-fmtandanglerclean on changed filesOpened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.