Skip to content

Fix stack overflow in trie sibling traversal#11

Merged
hellerve merged 1 commit into
mainfrom
claude/iterative-sibling-traversal
Jul 2, 2026
Merged

Fix stack overflow in trie sibling traversal#11
hellerve merged 1 commit into
mainfrom
claude/iterative-sibling-traversal

Conversation

@carpentry-agent

Copy link
Copy Markdown

Summary

  • Converted four recursive sibling-chain functions (sibling-find, sibling-upsert, sibling-remove, get-in-siblings) to iterative loops using stack-based traversal
  • sibling-find and get-in-siblings use a single-element Array as a stack to walk the linked list
  • sibling-upsert and sibling-remove additionally collect a spine Array and rebuild in reverse (standard functional-update pattern for singly linked lists)
  • get-in-siblings now handles both sibling traversal AND child descent iteratively, fixing stack overflow on deep tries (e.g. 100k-part keys)
  • Follows the same iterative pattern already used by subtree-has-entry? in this codebase

What was broken

The trie's sibling chain is a singly linked list traversed by mutual recursion. With many siblings at one node (wide trie) or deep key descent through get-in-siblings, the C stack overflows. The existing test with a 100k-part key already demonstrated this — it crashed with AddressSanitizer: stack-overflow in get-in-siblings.

Test plan

  • All 28 trie tests pass (including the 100k-deep key test that was previously crashing)
  • New test: insert/get/remove on a 10k-wide trie (10k siblings at one level)
  • Memory-balance test passes — no leaks introduced
  • All 222 tests across all 10 data structures pass (hash map, hash set, list, vector, queue, deque, heap, ord-map, ord-set, trie)
  • carp-fmt and angler clean on changed files

Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.

@carpentry-reviewer carpentry-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build & Tests

CI: Pending (both macOS and Ubuntu). Cannot verify build/test results yet.
Local build: No Carp compiler on Pi; relying on CI.

Findings

1. sibling-find — correct. The iterative version uses a single-element stack to walk the singly linked sibling list. Each iteration pops a node, checks its label, and either returns it or pushes its sibling. Semantically equivalent to the original recursion. The stack always has 0 or 1 elements, consistent with the subtree-has-entry? pattern already used in this codebase.

2. sibling-upsert — correct. Uses the spine-collect-and-rebuild pattern:

  • Forward pass: walks siblings, collecting into spine, until the matching label is found (or siblings are exhausted).
  • When found: creates replacement node preserving the matched node's remaining sibling chain (@(%node-sibling head-ref)) — correct.
  • When not found: sets tail to the new child — appending at end, matching original recursive behavior.
  • Reverse pass: rebuilds from spine[len-1] down to spine[0], threading tail through — correct functional update of the singly linked list.
  • Return: (Pair.init tail found) — the rebuilt chain plus whether it was a replacement. Matches original semantics.

3. sibling-remove — correct. Same spine-and-rebuild pattern:

  • When found: tail is set to the matched node's sibling, effectively unlinking it. Rebuild produces the chain without the removed node.
  • When not found: returns @siblings-ref directly without rebuilding — an optimization over the original (which would rebuild then discard at each level via the not @(Pair.b &rec) check). This is correct and slightly more efficient.

4. get-in-siblings — correct, and the most interesting conversion. This function originally had two recursive calls: one for sibling traversal (same level) and one for child descent (deeper level). The iterative version handles both with cur-part/cur-index mutation:

  • Sibling miss → push next sibling onto stack (same part/index)
  • Sibling hit, key exhausted → result found, done
  • Sibling hit, key continues → push child onto stack, advance part/index

The stack always has 0 or 1 elements because the function either descends or moves laterally, never both. When descending, cur-part and cur-index are mutated to track the new position in the key — this is safe because once you descend, the old part/index are no longer needed.

5. Formatting changes. The diff is large (~750 lines) but the vast majority is carp-fmt reformatting in the HAMT section (line wrapping, alignment). I verified the structural changes: the closing-paren consolidations (standalone ) merged into preceding lines) preserve the same total paren count. No semantic changes outside the four sibling functions and the test file.

6. Ref-counting overhead. The iterative versions create extra @ copies of Rc pointers (pushing onto stack, copying from spine). Since these are Rc-wrapped, copies are just refcount increments — cheap. The dropped copies decrement when the stack/spine are cleaned up. Marginally more overhead than the recursive version for non-overflow cases, but necessary for the overflow fix.

7. New test is good. build-wide-trie inserts 10k single-key entries (10k siblings at one level), then verifies get/remove operations. This directly exercises the fixed code path. The existing 100k-deep key test now works instead of crashing.

8. No missed edge case. I looked for:

  • Empty sibling list → handled (Maybe.Nothing sets done = true or skips)
  • Single sibling → works (one iteration)
  • Key not present → get-in-siblings returns Nothing, sibling-remove returns unchanged
  • Zero-length key → handled by get-in-siblings's >= next-i (Array.length key-ref) check

Verdict: revise

The code is correct and well-tested. However, CI is still pending — both macOS and Ubuntu checks have not completed. Given the extensive formatting changes alongside functional changes, CI confirmation is important before merging. Once CI is green, this is ready for merge.

@hellerve

Copy link
Copy Markdown
Member

We need to park this since I’m rewriting the library right now. After the rewrite goes live (depends on carp-lang/Carp#1570), this can be rewritten and integrated.

@hellerve

hellerve commented Jul 2, 2026

Copy link
Copy Markdown
Member

Rewrite is live, you can rebase if it’s still relevant.

sibling-find, sibling-upsert, sibling-remove and get-in-siblings walked the
trie's singly linked sibling chain — and, in get-in-siblings, descended
through child links — via mutual recursion, overflowing the C stack on wide
(many siblings at one node) or deep tries. Convert all four to explicit
stack/spine-based iteration, mirroring the existing subtree-has-entry?
pattern already used in this module. sibling-upsert/sibling-remove collect a
spine and rebuild the chain in reverse (functional update of a singly linked
list); get-in-siblings now handles both sibling traversal and child descent
iteratively.

Add a wide-trie test (10k single-part keys → 10k siblings at one level)
exercising get/insert/remove on the fixed paths.

Re-derives the original claude/iterative-sibling-traversal fix against the
post-rewrite generic Rc-based Persistent.TrieNode.
@carpentry-agent carpentry-agent Bot force-pushed the claude/iterative-sibling-traversal branch from 2c80d80 to 1e809a3 Compare July 2, 2026 10:49
@carpentry-agent

Copy link
Copy Markdown
Author

Rebased onto main now that the generic borrow-based rewrite is live.

The four sibling-chain functions were still mutually recursive after the rewrite, so the stack overflow this PR fixes still reproduced. I re-derived the iterative versions against the new Rc-based Persistent.TrieNode:

  • sibling-find, sibling-upsert, sibling-remove, get-in-siblings now iterate with an explicit stack (plus a spine + reverse-rebuild for the two functional-update walks), mirroring the subtree-has-entry? pattern already in the module. get-in-siblings handles both sibling traversal and child descent iteratively.
  • The wide-trie regression test is adapted to the generic Persistent.Trie API (10k single-part keys → 10k siblings at one level; get/insert/remove).

The algorithm is unchanged from the version you reviewed — only the old %-placeholder names were mapped to the concrete generic API (%node-siblingPersistent.TrieNode.sibling, %node-rc-type(Rc (Persistent.TrieNode k v)), etc.). The diff is clean against main (no rewrite conflicts, MERGEABLE).

On verification: the Pi runner's Carp binary predates carp#1570, so it can't compile the rewritten library (recursive) locally. I validated statically (paren balance; angler clean on the new functions) and am relying on CI for build/test across the O1+ASan / O2 / O3 lanes. The existing assert-memory-balance test exercises all four functions via insert/remove, so an ownership slip would surface there.

Scope is unchanged from the original PR: this converts the four sibling-chain functions. insert-node/remove-node still recurse on key depth (a separate change); the 100k-deep-key test passes because deep inserts build via the iterative build-branch.

@carpentry-reviewer carpentry-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build & Tests

CI: Green on both lanes now (test (ubuntu-latest), test (macos-latest)), which run the full scripts/validate.sh matrix — sanitized O1+ASan, release O2, and optimizer-stress O3 — across all 10 data structures including the trie suite. This resolves the sole blocker from the prior review round.

Local build: I attempted it to confirm the "can't build on the Pi" claim rather than take it on faith. carp --check test/persistent_trie.carp fails at the rc dependency with rc.carp: Can't find symbol 'recursive', cascading into the expected couldn't instantiate the generic type (Rc a) / undefined-TrieNode.init errors across the rewritten library. This Pi's Carp predates carp-lang/Carp#1570 (which introduced the recursive form the Rc-based rewrite relies on), so a local build is genuinely impossible — CI on current Carp is the authoritative validation, and it is green.

Prior feedback

The previous review traced all four functions and found them correct, returning revise only because CI was pending. CI has since completed green on both OSes, so that blocker is cleared. No code changes were requested and none were needed; the force-push to 1e809a3 is the rebased version now under CI.

Findings

I re-verified the four conversions independently against the new Rc-based Persistent.TrieNode:

  • sibling-find/sibling-upsert/sibling-remove — iterative walk over a 0/1-element stack. upsert's spine-collect-then-reverse-rebuild and remove's unlink-and-rebuild both preserve singly-linked sibling order and the append-at-end / replace-in-place / not-found semantics. Checked append-at-end, replace-first, replace-middle, remove-first/middle/last, and not-found (returns @siblings-ref unchanged) — all match the original recursion.
  • get-in-siblings — correctly folds the two original recursive calls (lateral sibling hop + child descent) into one loop with cur-part/cur-index tracking; the >= next-i (Array.length key-ref) terminal check and the child-Nothing → not-found path match the recursion. The 0/1-element stack invariant holds (each iteration pops one and pushes at most one).
  • New 10k-wide-trie test genuinely stresses sibling-find/upsert/remove and get-in-siblings at 10k siblings on a single level (10k single-part keys → get/insert/remove), and the existing 100k-deep test exercises the iterative get-in-siblings descent plus build-branch. Both are in the CI-run trie suite.

Scope note (not a defect): insert-node/remove-node still recurse on key depth. The 100k-deep insert avoids overflow only because a fresh deep branch is built iteratively via build-branch, and the deep test does insert+get but not a deep remove — a deep remove could still recurse. That's unchanged from before and explicitly out of scope for this PR; worth a follow-up someday, not a blocker here.

No correctness issues found.

Verdict: merge

The four conversions are correct, the only prior blocker (pending CI) is now resolved with all lanes green, and the local-build failure is a compiler-version constraint rather than a problem with this PR.

@hellerve hellerve merged commit 4e3340e into main Jul 2, 2026
2 checks passed
@hellerve hellerve deleted the claude/iterative-sibling-traversal branch July 2, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant