Skip to content

[WIP] Use Kahan (Neumaier) summation for float sums#8373

Closed
dimitarvdimitrov wants to merge 3 commits into
developfrom
mitko/kahan-float-aggregations
Closed

[WIP] Use Kahan (Neumaier) summation for float sums#8373
dimitarvdimitrov wants to merge 3 commits into
developfrom
mitko/kahan-float-aggregations

Conversation

@dimitarvdimitrov

@dimitarvdimitrov dimitarvdimitrov commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Stacked on #8367 so the CodSpeed run shows the cost of compensated summation against the freshly-added benchmarks.

In draft because we probably don't want to eat the perf cost

What this pr does

  • SumState::Float now holds a KahanSum (running sum + compensation term) implementing Neumaier's variant of Kahan compensated summation. When an addition produces a non-finite value the compensation update is skipped, preserving the existing overflow-to-infinity and inf + -inf => NaN (saturated) semantics.
  • Float sum partials are a struct {sum: f64, compensation: f64} instead of a plain f64, so precision survives partial flush/combine boundaries (chunked arrays, kernel partials, accumulator merges):
    • Sum::combine_partials accepts both the struct partial and a plain f64.
    • The legacy stats bridge in Accumulator::accumulate falls through on a partial dtype mismatch instead of erroring, and Sum::try_accumulate consumes the cached f64 Stat::Sum for floats so the stats short-circuit is preserved.

Add f64 cases to the aggregate_sum and aggregate_grouped benchmarks to
establish a baseline for upcoming float summation changes.

Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
SumState::Float now carries a KahanSum (running sum plus compensation
term) instead of a bare f64, so float sum aggregates accumulate with
Neumaier's variant of Kahan summation. The compensation update is
skipped when the addition produces a non-finite value, keeping the
existing overflow-to-infinity and inf + -inf => NaN (saturated)
semantics.

Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Float sum partials are now a struct {sum: f64, compensation: f64}
instead of a plain f64, so compensated-summation precision survives
partial flush/combine boundaries (chunked arrays, kernel partials,
accumulator merges).

- Sum::partial_dtype diverges from return_dtype for floats;
  to_scalar emits the struct, finalize/finalize_scalar collapse it
  back to sum + compensation.
- Sum::combine_partials accepts both the struct partial and a plain
  f64 (constant-multiply path).
- The legacy stats bridge in Accumulator::accumulate now falls
  through on a partial dtype mismatch instead of erroring, and
  Sum::try_accumulate consumes the cached f64 Stat::Sum for floats.

Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
@codspeed-hq

codspeed-hq Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merging this PR will degrade performance by 20.43%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 2 improved benchmarks
❌ 7 regressed benchmarks
✅ 1527 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation sum_f64_all_valid 294.2 µs 509.8 µs -42.28%
Simulation sum_f64_nulls_clustered 629.1 µs 1,033.5 µs -39.13%
Simulation sum_f64 711.9 µs 1,105.9 µs -35.63%
Simulation sum_f64_clustered_nulls 553.3 µs 768.5 µs -28%
Simulation decompress_rd[f64, (100000, 0.0)] 845.5 µs 1,024.6 µs -17.48%
Simulation decompress_rd[f32, (100000, 0.0)] 499.1 µs 587.2 µs -15%
Simulation bitwise_not_vortex_buffer_mut[128] 244.4 ns 273.6 ns -10.66%
Simulation chunked_varbinview_canonical_into[(100, 100)] 309 µs 272.8 µs +13.24%
Simulation chunked_varbinview_into_canonical[(100, 100)] 361 µs 326.3 µs +10.63%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing mitko/kahan-float-aggregations (86623b1) with mitko/sum/f64-bench (4c8059c)

Open in CodSpeed

Base automatically changed from mitko/sum/f64-bench to develop June 11, 2026 17:32
Some(match &return_dtype {
// Float partials carry the Kahan compensation term so that precision is preserved
// when partial sums are flushed and re-combined.
DType::Primitive(ptype, _) if ptype.is_float() => float_partial_dtype(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @joseph-isaacs - but this would be a breaking change for serialized SUM statistics if we had started to serialize them in this way.

Do other engines use this?

@gatesn gatesn Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we should treat Vortex as a collection of physical kernels that some query engine may resolve to based on the engine's own semantics.

Therefore, this should be its own aggregate function. And DuckDB / DataFusion aggregate push-down can pick this one if it needs to have matching execution semantics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datafusion supports reporting Sum statistic, that was blocked though until 54 that released this week since sum stats were very slow

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we spoke with Joe offline too. Keeping backwards compat would make this change more complicated.

the only one I could find was duckdb which has a dedicated fsum (aka sum_kahan) function. Because of that I think it's ok to keep a less accurate sum aggregation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants