Skip to content

Fix float truncation in min/max derived field scripts#1292

Draft
jwils wants to merge 1 commit into
mainfrom
joshuaw/fix-min-max-float-truncation
Draft

Fix float truncation in min/max derived field scripts#1292
jwils wants to merge 1 commit into
mainfrom
joshuaw/fix-min-max-float-truncation

Conversation

@jwils

@jwils jwils commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Why

min_value/max_value derived fields sourced from Float fields silently corrupt data: the generated painless function coerces every incoming number (and the stored value) to a long, so fractional values are stored truncated (2.9 becomes 2) and compared truncated (2.4 vs 2.9 compare as equal). The coercion was added in 8d3d664 to fix a ClassCastException on mixed Integer/Long values, but it over-coerces.

What

  • Replaced the coerce-then-Collections.min/max approach with a loop that tracks the winning element, using a new numeric-aware minOrMaxValue_compareValues painless function: integral values compare via Long.compare (preserving full precision beyond 2^53 for JsonSafeLong/LongString sources), comparisons involving a floating point value use Double.compare (handling JSON producing mixed Integer/Double values for the same Float field), and non-numeric values (Date/DateTime/LocalTime strings) keep their natural compareTo.
  • The original (un-coerced) winning value is what gets stored, so indexed values are never altered.
  • Regenerated schema artifacts; the update_WidgetCurrency_from_Widget_* script ID changed since the script contents changed.
  • Added a Float-sourced min/max derived field to the test schema plus acceptance coverage for fractional values and mixed integral/fractional values (in both indexing orders), and unit coverage for the generated function text.

Risk Assessment

Low. The derived-field update script contents (and therefore the MD5-derived script ID) change, so deployments will pick up the new script when schema artifacts are dumped and cluster configuration is applied. Behavior for existing integral and string-valued min/max fields is unchanged (verified by the existing regression tests).

The min/max derived field painless function coerced every incoming
Number (and the stored value) to a long before comparing, to avoid a
ClassCastException on mixed Integer/Long values. That truncated
floating point values: a Float-sourced min_value/max_value field
compared truncated values (so 2.4 vs 2.9 looked equal) and stored the
truncated result (2.9 became 2), silently corrupting derived indices.

Replace the coerce-then-Collections.min/max approach with a loop that
tracks the winning element and a numeric-aware comparison function:
integral values are compared via Long.compare (preserving full
precision beyond 2^53), comparisons involving a floating point value
use Double.compare (preserving fractional parts, including when JSON
produces mixed Integer/Double values for the same field), and
non-numeric values keep using their natural compareTo. The original
(un-coerced) winning value is what gets stored.

Also add a Float-sourced min/max derived field to the test schema and
acceptance coverage for fractional and mixed integral/fractional
values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant