Skip to content

✨ RFC-HDFG-2026-001: String-based filter configuration API#6470

Open
brtnfld wants to merge 15 commits into
HDFGroup:developfrom
brtnfld:6153
Open

✨ RFC-HDFG-2026-001: String-based filter configuration API#6470
brtnfld wants to merge 15 commits into
HDFGroup:developfrom
brtnfld:6153

Conversation

@brtnfld

@brtnfld brtnfld commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds a human-readable key=value parameter string API for HDF5 filters, alongside the existing integer cd_values arrays: H5Pappend_filter, H5Pget_filter_params_by_idx, H5Zconfig_get_int/double/bool/str, H5Z_filter_id_by_name, H5Zget_filter_info2.
  • Extends H5Z_class3_t with name, description, set_config/get_config callbacks (plus reserved blob-callback placeholders), and threads dxpl_id/scaled[]/ndims through H5Z_pipeline so v3 filter callbacks have full context. All six built-in filters (deflate, shuffle, fletcher32, nbit, szip, scaleoffset) implement set_config/get_config.
  • Vendors a TOML subset parser (tomlc17, MIT) in src/tomlc17/, built unconditionally into libhdf5 with hidden visibility to avoid symbol collisions.
  • No new on-disk pipeline version: parameter strings are converted to cd_values at H5Pappend_filter time and stored in the existing v2 pipeline message, so read compatibility with older files/libraries is preserved.
  • Adds Fortran, C++, and Java bindings, plus h5dump/h5repack support for the new parameter strings.
  • Fixes a tfilter2 regression: skips the new string-max-boundary deflate test on builds without zlib.

Fixes #6153.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Checklist

This PR touches the following areas. Each needs a sign-off
from its listed owners before merging.

@brtnfld brtnfld requested a review from gheber as a code owner June 19, 2026 17:59
Comment thread release_docs/CHANGELOG.md Outdated
@github-project-automation github-project-automation Bot moved this from To be triaged to In progress in HDF5 - TRIAGE & TRACK Jun 19, 2026
Comment thread src/tomlc17/tomlc17.c Dismissed
Comment thread tools/src/h5repack/h5repack_parse.c Fixed
@brtnfld brtnfld changed the title RFC-HDFG-2026-001: String-based filter configuration API ✨ RFC-HDFG-2026-001: String-based filter configuration API Jun 19, 2026
@brtnfld brtnfld added the HDFG-internal Internally coded for use by the HDF Group label Jun 19, 2026
@brtnfld brtnfld removed request for gheber and glennsong09 June 20, 2026 01:01
hyoklee
hyoklee previously approved these changes Jun 20, 2026
brtnfld and others added 7 commits June 22, 2026 09:27
Adds a human-readable key=value parameter string API for HDF5 filters,
alongside the existing integer cd_values arrays.

New C API:
- H5Pappend_filter(plist, filter_id, flags, params) — appends a filter
  using either a key=value string or raw cd_values (H5Z_params_t)
- H5Pget_filter_params_by_idx(plist, idx, buf, buf_size, content_len) —
  retrieves the parameter string for a filter by pipeline index
- H5Zconfig_get_int/double/bool/str — typed accessors for use inside
  filter set_config callbacks
- H5Z_filter_id_by_name(name) — look up a filter id by registered name
- H5Zget_filter_info2(id, info) — extended filter info including v3 fields

New H5Z_class3_t fields: name, description, set_config, get_config,
and reserved blob-callback placeholders (write_blob/read_blob/close_blob).
H5Z_pipeline gains dxpl_id, scaled[], and ndims arguments threaded
through from all call sites so v3 filter callbacks have full context.

All six built-in filters (deflate, shuffle, fletcher32, nbit, szip,
scaleoffset) implement set_config/get_config callbacks.

TOML subset parser: tomlc17 (MIT) vendored in src/tomlc17/ and compiled
unconditionally into libhdf5. Hex-float literals are transparently
rewritten to decimal before parsing. tomlc17 symbols are hidden via
-fvisibility=hidden to prevent namespace collisions.

On-disk format: no new pipeline version. Parameter strings are converted
to cd_values by set_config at H5Pappend_filter time and stored using the
existing v2 pipeline message. On read, get_config reconstructs the string.
Full backward read compatibility is preserved.

Fortran, C++, and Java bindings added. Tests in test/tfilter2.c
(~2300 lines) and testpar/t_filters_parallel.c (par-01–par-04).
h5dump displays filter parameter strings; h5repack accepts TOML-form
UD= filter specs.

Code-review fixes included: tomlc17 visibility, H5Pget_filter_params_by_idx
arg validation and true-length two-pass contract, flags re-validation after
set_config, H5Z_register3 runtime plugin validation, Java two-pass protocol
and h5libraryError() consistency, CHANGELOG corrections.

Fixes GitHub issue HDFGroup#6153
…ib absent

Two bugs caused H5TEST-tfilter2 to fail on BSD and non-zlib CI builds:

1. H5Zconfig.c: The review changed strlen(params) > H5Z_CONFIG_STRING_MAX to >=,
   which incorrectly rejected strings of exactly H5Z_CONFIG_STRING_MAX bytes.
   The public contract (and test_config_string_max_boundary) accept strings up to
   and including H5Z_CONFIG_STRING_MAX characters.  Reverted to >.

2. test/tfilter2.c: test_config_string_max_boundary called H5Pappend_filter with
   H5Z_FILTER_DEFLATE without first checking availability.  On CI builds without
   zlib the filter lookup fails with "filter not found" before the length check
   runs.  Added H5Zfilter_avail guard; the test is now SKIPPED on non-zlib builds.
-p was printing PARAMS_STRING and DESCRIPTION as flat siblings of
FILTERS{} rather than inside the entry of the filter they describe,
which is ambiguous (or outright misleading) once more than one filter
is applied to a dataset. Move them inside each filter's own block,
opening one for SHUFFLE/FLETCHER32/NBIT only when there's something to
attach.

Update the affected h5dump DDL fixtures and add DDLBNF220.dox
documenting the new <filter_extra> nesting, bumping the \ref DDLBNF200
cross-references to DDLBNF220.
Two test failures from the h5dump nesting fix (2604f5d):

1. tools/test/h5repack/expected/deflate_limit.h5repack_layout.h5.ddl and
   h5repack_layout.h5-plugin_test.ddl had PARAMS_STRING (and DESCRIPTION)
   as flat siblings of the filter block rather than nested inside it —
   the h5dump fixture files were updated in 2604f5d but these two
   h5repack expected files were missed.

2. java/src-jni/jni/h5zImp.c called CALL_CONSTRUCTOR with a 6-arg array
   and signature "(IILjava/lang/String;Ljava/lang/String;ZZ)V" but
   H5Z_class_info_t's constructor takes 7 args (adds has_blob_callbacks).
   GetMethodID failed at runtime with the wrong arity.  Add args[6] =
   JNI_FALSE and update the descriptor to ZZZ)V.
- java/test/TestH5Z.java (FFM): fix wrong method names
  - H5Pget_filter2 (private) -> H5Pget_filter (public), pass new int[1]
    instead of null for filter_config
  - H5Zconfig_get_int/double/bool/str -> H5Zconfig_get_param (overloaded)
  - Update assertion messages to match corrected method names
- release_docs/CHANGELOG.md: replace em-dashes with hyphens per style
- tools/src/h5repack/h5repack_parse.c: restructure UD= legacy numeric
  loop from for-loop to while-loop to avoid CodeQL "loop counter modified
  in body" warning; add bounds guard after comma-skip u++
brtnfld and others added 6 commits June 22, 2026 09:27
…rocedures

The RFC renamed the PRIVATE generic dispatch procedures from
h5zget_filter_info1_f/h5zget_filter_info2_f to
h5zget_filter_info_flags_f/h5zget_filter_info_class_f to reflect
their actual roles. Update hdf5_fortrandll.def.in to export the
new mangled names so Intel ifx on Windows links successfully.
H5Pappend_filter is always available in this version and is the
standard way to detect the string-based filter config API. The
TOMLC17 macro is redundant.
…okup

SymbolLookup.loaderLookup() only finds symbols loaded via
System.loadLibrary(); jextract loads libhdf5 via its own mechanism so
loaderLookup() never finds the RFC symbols at runtime.

Replace all SymbolLookup.loaderLookup()+MethodHandle patterns in the RFC
methods with direct hdf5_h.* calls, consistent with every other method in
H5.java:
  H5Pappend_filter (both overloads)
  H5Pget_filter_params_by_idx
  H5Zget_filter_info2
  H5Zconfig_has_key
  H5Zconfig_get_param (long[], double[], boolean[], String[])
H5Pappend_filter with CDVALUES is documented as identical to H5Pset_filter.
Routing through H5Pset_filter avoids constructing an H5Z_params_t struct
manually in FFM heap memory, which was silently producing cd_nelmts=0.
Three bugs in H5Pget_filter2:
1. cd_nelmts_segment was allocated as JAVA_INT (4 bytes) but size_t* needs
   8 bytes on 64-bit — the C write overflowed into cd_values_segment.
2. cd_nelmts[0] was read back from cd_values_segment (wrong) instead of
   cd_nelmts_segment.
3. cd_values and flags were never copied back from their native segments.

Fix: allocate cd_nelmts_segment as JAVA_LONG, seed it with the caller's
capacity on input, and copy all three output arrays back correctly.
github-actions Bot and others added 2 commits June 22, 2026 16:43
…thods

The FFM TestH5Z.java gained 8 new test methods covering the new filter
string-config API (H5Pappend_filter, H5Pget_filter_params_by_idx,
H5Zconfig_get_param_*), but the expected-output reference file was left
at 5 tests, causing the JUnit-TestH5Z CTest comparison to fail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

HDFG-internal Internally coded for use by the HDF Group

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

✨ [Feature Request] String-Based Configuration Interface for Filters

3 participants