Skip to content

Fix deflate dropping the final byte at CHUNK-multiple-plus-one lengths#3

Merged
hellerve merged 1 commit into
masterfrom
claude/zlib-deflate-final-byte
Jun 29, 2026
Merged

Fix deflate dropping the final byte at CHUNK-multiple-plus-one lengths#3
hellerve merged 1 commit into
masterfrom
claude/zlib-deflate-final-byte

Conversation

@carpentry-agent

Copy link
Copy Markdown
Contributor

Problem

ZLib.deflate silently loses the final byte of any input whose length is an exact multiple of CHUNK (16384) plus one — i.e. 16385, 32769, 49153, … bytes. A deflateinflate round-trip of such an input returns a string that is one byte short, with no error: the deflate stream is well-formed, just truncated.

Root cause

In zlib_helper.h, the deflate loop decided when to signal Z_FINISH with:

flush = offs >= len-1 ? Z_FINISH : Z_NO_FLUSH;

offs advances in CHUNK-sized steps. When len == k*16384 + 1, after reading the k-th full chunk offs == k*16384 == len-1, so offs >= len-1 is already true and Z_FINISH is signalled — even though one byte remains unread. The outer while (flush != Z_FINISH) then exits before that last byte is ever copied into the input buffer and fed to deflate.

For every other length the condition happens to coincide with "all input consumed", which is why the bug only surfaces at the k*16384 + 1 boundary.

Fix

Compare against len instead of len-1:

flush = offs >= len ? Z_FINISH : Z_NO_FLUSH;

This is surgical: it changes behavior only at len == k*16384 + 1 (the one case where offs lands on len-1 at a chunk boundary). All other sizes — including exact multiples of CHUNK and every size the existing tests cover — are unaffected.

Tests

Added two boundary round-trip regression tests, at 16385 and 32769 bytes, each built as n copies of "a" plus a distinct trailing "b" so the dropped byte is unmistakable.

Verified by temporarily reverting the one-line fix: with the old len-1 the two new tests fail while the existing nine still pass (confirming the bug is exactly at this boundary); with the fix, all 11 pass. carp-fmt --check and angler are clean on the changed test file.

The deflate loop signalled Z_FINISH when `offs >= len-1`, so after a full
CHUNK (16384 bytes) had been read it finished one byte early whenever the
input length was an exact multiple of CHUNK plus one (16385, 32769, ...):
the last byte was never fed to deflate, producing a complete-but-truncated
stream that silently loses a byte on a deflate->inflate round-trip. Compare
against `len` instead.

Add boundary round-trip regression tests at 16385 and 32769 bytes.

@carpentry-reviewer carpentry-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build & Tests

Checked out the branch, built and ran the suite with carp -x tests/zlib.carp: 11/11 pass. CI (ubuntu + macos) is green.

I reproduced the red/green claim directly: reverting just the one-line fix (lenlen-1) makes the suite go 9 pass / 2 fail — exactly the two new boundary tests (16385, 32769) — and restoring it brings it back to 11/11. carp-fmt -c and angler are both clean on tests/zlib.carp.

Findings

The fix is correct, minimal, and the root-cause analysis in the description is accurate. offs only ever lands on len-1 when len = k*CHUNK + 1 (the last full chunk leaves exactly one byte), so offs >= len-1 signalled Z_FINISH one iteration early there and the trailing byte was never fed to deflate. Comparing against len makes Z_FINISH fire precisely when offs == len (all input consumed) and is provably a no-op for every other length — including exact CHUNK multiples, where offs reaches len on the final full chunk regardless.

To go beyond the two committed tests I round-tripped a spread of exact-length inputs (each with a distinct final byte): 1, 2, 16383, 16384, 16385, 16386, 32768, 32769, 49153, 99999, 100000 — all preserve length and content. So the change fixes the k*CHUNK+1 family without regressing exact multiples, CHUNK-1, or large multi-chunk inputs.

One out-of-scope observation (a pre-existing bug, not introduced here, not a blocker): (deflate "") aborts with Assertion ret == Z_STREAM_END failed at zlib_helper.h:184. For empty input the loop breaks on avail_in <= 0 before deflate is ever called with Z_FINISH, so ret stays Z_OK. I confirmed this reproduces identically on master, and the changed line is never reached on that path. Worth a separate focused PR (the empty-input round-trip), but unrelated to this change.

Verdict: merge

Correct, surgical, well-tested fix for a real silent-truncation bug; build green, tests pass, red/green verified, no regressions found.

@hellerve hellerve merged commit 667122d into master Jun 29, 2026
2 checks passed
@hellerve hellerve deleted the claude/zlib-deflate-final-byte branch June 29, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant