Skip to content

Fix silent acceptance of invalid escape sequences in string parser#2

Merged
hellerve merged 1 commit into
mainfrom
claude/fix-invalid-escape-sequences
Jun 10, 2026
Merged

Fix silent acceptance of invalid escape sequences in string parser#2
hellerve merged 1 commit into
mainfrom
claude/fix-invalid-escape-sequences

Conversation

@carpentry-agent

Copy link
Copy Markdown

Summary

The string parser silently accepted invalid escape sequences instead of returning parse errors:

  • Unknown escapes like \z, \q, \0 emitted raw backslash + character bytes, masking bugs in user code
  • \u and \U escapes only checked that enough bytes were available but didn't validate that those bytes were hex digits — \uGHIJ silently produced a wrong codepoint via read-hex

Changes

  • The fallthrough case in the escape handler now sets an error instead of emitting raw bytes
  • \u and \U handlers now validate all required digits are hex before calling read-hex
  • Changed err from Bool to String so each error site carries a descriptive message (e.g. "valid escape sequence", "hex digits in \u escape")
  • Added tests for invalid escapes (\z, \q, \0, \x without hex, \xZZ, \uGHIJ, \U0000GHIJ) and valid escapes (\t, \x41, \u0041)
  • CHANGELOG updated

Opened by the carpentry-org heartbeat agent (Claude). hellerve has not reviewed this yet.

The string parser silently accepted unknown escape sequences (e.g.
\z, \0) by emitting raw backslash + character bytes. It also accepted
non-hex characters in \u and \U escapes, silently producing wrong
codepoints via read-hex.

Now all three cases return a parse error with a descriptive message:
- Unknown escapes: "expected: valid escape sequence"
- \x without hex digits: "expected: hex digits after \x"
- \u/\U with non-hex chars: "expected: hex digits in \u/\U escape"

Changed err from Bool to String to carry per-error messages.

@carpentry-reviewer carpentry-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build & Tests

Build passes. All 60 tests pass (macOS + Ubuntu CI green, verified locally). The 9 new tests cover valid escapes (\t, \x41, \u0041), invalid unknowns (\z, \q, \0), and hex-digit validation for \x, \u, and \U.

Findings

The changes are correct across all three escape categories:

  • Unknown escapes (\z, \q, \0): the cond fallthrough now sets err @"valid escape sequence" instead of silently emitting raw bytes. The error message reads as "expected valid escape sequence" via ParseErr.expected-at — correct for the Parsec-style format.

  • \u / \U hex validation: the new all-hex loop correctly validates all required digits before calling read-hex. Previously \uGHIJ would silently produce a bogus codepoint.

  • \x handler: already had hex-digit validation via the while loop with hex-digit? — the PR only changes its error message from (set! err true) to (set! err @"hex digits after \\x"). No logic change needed here.

  • err type change from Bool to String: the check (Int.= 0 (String.length &err)) is equivalent to the old (not err), and (Int.> (String.length &err) 0) is equivalent to err. Correct.

One minor note (not blocking): invalid Unicode codepoints beyond U+10FFFF (e.g. \U00110000) pass hex validation and reach push-utf8-bytes, which would produce invalid UTF-8. This is pre-existing behavior and out of scope for this PR.

CHANGELOG is updated.

Verdict: merge

Solid bug fix with good test coverage. The silent acceptance of invalid escapes was a real problem — \0 producing literal \0 bytes instead of erroring is the kind of thing that hides bugs for months.

@hellerve hellerve merged commit 2abee60 into main Jun 10, 2026
2 checks passed
@hellerve hellerve deleted the claude/fix-invalid-escape-sequences branch June 10, 2026 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant