Skip to content

Add configurable gzip compression level#1674

Open
gdesrosiers1805 wants to merge 1 commit into
apache:mainfrom
gdesrosiers1805:feature/daffodil-3082
Open

Add configurable gzip compression level#1674
gdesrosiers1805 wants to merge 1 commit into
apache:mainfrom
gdesrosiers1805:feature/daffodil-3082

Conversation

@gdesrosiers1805
Copy link
Copy Markdown
Contributor

@gdesrosiers1805 gdesrosiers1805 commented May 28, 2026

The gzip layer previously used GZIPOutputStream with the JDK's default
compression level (level 6), with no way for schemas to choose a different
level. This change exposes the speed-vs-size trade-off that the gzip format
already supports, giving users control over how their data is compressed
during unparsing.

As a side benefit, this fix also addresses cross-zlib test failures
observed on Fedora 42. Fedora's OpenJDK links against zlib-ng while
Temurin and most other distributions bundle stock zlib. Both produce valid
gzip output, but the deflate token streams differ due to different
match-finding heuristics, causing tests with hardcoded byte baselines to
fail on Fedora. For the small test data used in Daffodil's gzip tests,
level 9 happens to produce identical output on both implementations,
letting the tests pass regardless of which zlib is linked. This
convergence at level 9 is empirical, not guaranteed.

Changes:

  • Add compressionLevel DFDL variable to the gzip layer schema, defaulting
    to Deflater.DEFAULT_COMPRESSION (-1). Schemas can override via
    dfdl:newVariableInstance or dfdl:setVariable, and users can set the
    value externally without having to modify the schema.

  • Add ConfigurableGZIPOutputStream, a GZIPOutputStream subclass that allows
    the compression level to be set via constructor argument.

  • Update GZipLayer to accept the compressionLevel variable via
    setLayerVariableParameters and use it when constructing the output stream.

  • Remove GZIPFixedOutputStream and the associated fixIsNeeded() method.
    These existed to work around a pre-Java-16 bug where GZIPOutputStream
    wrote 0x00 instead of the spec-compliant 0xFF for the gzip OS header
    byte. Since Daffodil's minimum supported Java version is now 17, this
    workaround is no longer needed.

  • Add -parameters to javac options This preserves Java parameter names in
    bytecode, which is required by Daffodil's reflection-based layer parameter
    resolution. Without this, Java setters appear with parameters named arg0/arg1/...
    and cannot be matched to schema variables.

  • Update test schemas (exampleGzipLayer.dfdl.xsd, exampleGzipLayer2.dfdl.xsd)
    to set compressionLevel=9 via newVariableInstance.

  • Update TestGzipErrors.makeGZIPData to generate test input data at level 9
    for byte-stable output across JVMs, with comments documenting the
    empirical nature of the convergence.

DAFFODIL-3082

Copy link
Copy Markdown
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 👍 just a bunch of minor comments

Comment thread daffodil-core/src/main/java/org/apache/daffodil/layers/runtime1/GZipLayer.java Outdated
Comment thread daffodil-core/src/main/java/org/apache/daffodil/layers/runtime1/GZipLayer.java Outdated
@gdesrosiers1805 gdesrosiers1805 force-pushed the feature/daffodil-3082 branch from 4845611 to 8572e72 Compare May 29, 2026 04:34
Comment thread daffodil-core/src/main/java/org/apache/daffodil/layers/runtime1/GZipLayer.java Outdated
The gzip layer previously used GZIPOutputStream with the JDK's default
compression level (level 6), with no way for schemas to choose a different
level. This change exposes the speed-vs-size trade-off that the gzip format
already supports, giving users control over how their data is compressed
during unparsing.

As a side benefit, this fix also addresses cross-zlib test failures
observed on Fedora 42. Fedora's OpenJDK links against zlib-ng while
Temurin and most other distributions bundle stock zlib. Both produce valid
gzip output, but the deflate token streams differ due to different
match-finding heuristics, causing tests with hardcoded byte baselines to
fail on Fedora. For the small test data used in Daffodil's gzip tests,
level 9 happens to produce identical output on both implementations,
letting the tests pass regardless of which zlib is linked. This
convergence at level 9 is empirical, not guaranteed.

Changes:

- Add `compressionLevel` DFDL variable to the gzip layer schema, defaulting
  to Deflater.DEFAULT_COMPRESSION (-1). Schemas can override via
  `dfdl:newVariableInstance` or `dfdl:setVariable`, and users can set the
  value externally without having to modify the schema.

- Add ConfigurableGZIPOutputStream, a GZIPOutputStream subclass that allows
  the compression level to be set via constructor argument.

- Update GZipLayer to accept the compressionLevel variable via
  setLayerVariableParameters and use it when constructing the output stream.

- Remove GZIPFixedOutputStream and the associated fixIsNeeded() method.
  These existed to work around a pre-Java-16 bug where GZIPOutputStream
  wrote 0x00 instead of the spec-compliant 0xFF for the gzip OS header
  byte. Since Daffodil's minimum supported Java version is now 17, this
  workaround is no longer needed.

- Add `-parameters` to javac options This preserves Java parameter names in
  bytecode, which is required by Daffodil's reflection-based layer parameter
  resolution. Without this, Java setters appear with parameters named arg0/arg1/...
  and cannot be matched to schema variables.

- Update test schemas (exampleGzipLayer.dfdl.xsd, exampleGzipLayer2.dfdl.xsd)
  to set compressionLevel=9 via newVariableInstance.

- Update TestGzipErrors.makeGZIPData to generate test input data at level 9
  for byte-stable output across JVMs, with comments documenting the
  empirical nature of the convergence.

DAFFODIL-3082
@gdesrosiers1805 gdesrosiers1805 force-pushed the feature/daffodil-3082 branch from 8572e72 to cf5db56 Compare May 29, 2026 17:28
@gdesrosiers1805 gdesrosiers1805 changed the title Add configurable gzip compression level to fix cross-zlib test failures Add configurable gzip compression level May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants