Add configurable gzip compression level#1674
Open
gdesrosiers1805 wants to merge 1 commit into
Open
Conversation
stevedlawrence
approved these changes
May 28, 2026
Member
stevedlawrence
left a comment
There was a problem hiding this comment.
+1 👍 just a bunch of minor comments
4845611 to
8572e72
Compare
The gzip layer previously used GZIPOutputStream with the JDK's default compression level (level 6), with no way for schemas to choose a different level. This change exposes the speed-vs-size trade-off that the gzip format already supports, giving users control over how their data is compressed during unparsing. As a side benefit, this fix also addresses cross-zlib test failures observed on Fedora 42. Fedora's OpenJDK links against zlib-ng while Temurin and most other distributions bundle stock zlib. Both produce valid gzip output, but the deflate token streams differ due to different match-finding heuristics, causing tests with hardcoded byte baselines to fail on Fedora. For the small test data used in Daffodil's gzip tests, level 9 happens to produce identical output on both implementations, letting the tests pass regardless of which zlib is linked. This convergence at level 9 is empirical, not guaranteed. Changes: - Add `compressionLevel` DFDL variable to the gzip layer schema, defaulting to Deflater.DEFAULT_COMPRESSION (-1). Schemas can override via `dfdl:newVariableInstance` or `dfdl:setVariable`, and users can set the value externally without having to modify the schema. - Add ConfigurableGZIPOutputStream, a GZIPOutputStream subclass that allows the compression level to be set via constructor argument. - Update GZipLayer to accept the compressionLevel variable via setLayerVariableParameters and use it when constructing the output stream. - Remove GZIPFixedOutputStream and the associated fixIsNeeded() method. These existed to work around a pre-Java-16 bug where GZIPOutputStream wrote 0x00 instead of the spec-compliant 0xFF for the gzip OS header byte. Since Daffodil's minimum supported Java version is now 17, this workaround is no longer needed. - Add `-parameters` to javac options This preserves Java parameter names in bytecode, which is required by Daffodil's reflection-based layer parameter resolution. Without this, Java setters appear with parameters named arg0/arg1/... and cannot be matched to schema variables. - Update test schemas (exampleGzipLayer.dfdl.xsd, exampleGzipLayer2.dfdl.xsd) to set compressionLevel=9 via newVariableInstance. - Update TestGzipErrors.makeGZIPData to generate test input data at level 9 for byte-stable output across JVMs, with comments documenting the empirical nature of the convergence. DAFFODIL-3082
8572e72 to
cf5db56
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The gzip layer previously used GZIPOutputStream with the JDK's default
compression level (level 6), with no way for schemas to choose a different
level. This change exposes the speed-vs-size trade-off that the gzip format
already supports, giving users control over how their data is compressed
during unparsing.
As a side benefit, this fix also addresses cross-zlib test failures
observed on Fedora 42. Fedora's OpenJDK links against zlib-ng while
Temurin and most other distributions bundle stock zlib. Both produce valid
gzip output, but the deflate token streams differ due to different
match-finding heuristics, causing tests with hardcoded byte baselines to
fail on Fedora. For the small test data used in Daffodil's gzip tests,
level 9 happens to produce identical output on both implementations,
letting the tests pass regardless of which zlib is linked. This
convergence at level 9 is empirical, not guaranteed.
Changes:
Add
compressionLevelDFDL variable to the gzip layer schema, defaultingto Deflater.DEFAULT_COMPRESSION (-1). Schemas can override via
dfdl:newVariableInstanceordfdl:setVariable, and users can set thevalue externally without having to modify the schema.
Add ConfigurableGZIPOutputStream, a GZIPOutputStream subclass that allows
the compression level to be set via constructor argument.
Update GZipLayer to accept the compressionLevel variable via
setLayerVariableParameters and use it when constructing the output stream.
Remove GZIPFixedOutputStream and the associated fixIsNeeded() method.
These existed to work around a pre-Java-16 bug where GZIPOutputStream
wrote 0x00 instead of the spec-compliant 0xFF for the gzip OS header
byte. Since Daffodil's minimum supported Java version is now 17, this
workaround is no longer needed.
Add
-parametersto javac options This preserves Java parameter names inbytecode, which is required by Daffodil's reflection-based layer parameter
resolution. Without this, Java setters appear with parameters named arg0/arg1/...
and cannot be matched to schema variables.
Update test schemas (exampleGzipLayer.dfdl.xsd, exampleGzipLayer2.dfdl.xsd)
to set compressionLevel=9 via newVariableInstance.
Update TestGzipErrors.makeGZIPData to generate test input data at level 9
for byte-stable output across JVMs, with comments documenting the
empirical nature of the convergence.
DAFFODIL-3082