Skip to content

feat(vortex-geo): native Point extension type and GeoDistance scalar function#8372

Merged
HarukiMoriarty merged 12 commits into
developfrom
nemo/geo-point
Jun 12, 2026
Merged

feat(vortex-geo): native Point extension type and GeoDistance scalar function#8372
HarukiMoriarty merged 12 commits into
developfrom
nemo/geo-point

Conversation

@HarukiMoriarty

Copy link
Copy Markdown
Contributor

Summary

This PR adds a native point type to vortex-geo. Points are by far the most common geometry in analytical datasets, and a columnar representation makes their coordinates directly accessible without parsing WKB.

It also adds the scalar function: point-to-point distance with PostGIS ST_Distance semantics (planar/Euclidean, results in CRS units).

API Changes

Adds to vortex-geo, all registered through vortex_geo::initialize:

  • Extension type Point (vortex.geo.point): a location stored as Struct<x, y, z?, m?> of non-nullable f64, where z? is an optional elevation and m? an optional measure.
  • Coordinate: the internal value a point scalar unpacks to.
  • Scalar function GeoDistance (vortex.geo.distance): per-row distance between two equal-length point columns; either or both operands may be constant, in which case the query point is decoded once and broadcast.

Testing

Unit tests cover dtype validation for every GeoArrow dimension (and rejection of invalid storage), round-tripping a point column through scalar execution back to the original coordinates, WKT display for all four dimensions, and distance over all operand shapes: column-to-constant (either side), column-to-column, and constant-to-constant.


Supersedes #8342 (same change, moved from my fork to an in-repo branch).

Adds a GeoArrow-style `Point` extension type (Struct<x,y,[z],[m]>, dimension-ready)
and the planar `GeoDistance` scalar function between two point columns.

Signed-off-by: Nemo Yu <zyu379@wisc.edu>
… point

GeoDistance computes the planar distance from each point in a column to a
single constant query point (e.g. `ST_Distance(column, point)`). The second
operand must be a constant: it is decoded once and broadcast over the column
rather than materialized to one identical row per output element. Column-to-
column distance is unsupported and errors.

`try_new_array` now infers the output length from the point column instead of
taking it as an explicit parameter.

Signed-off-by: Nemo Yu <zhenghong@spiraldb.com>
…field types

Signed-off-by: Nemo Yu <zyu379@wisc.edu>
…s on construction

Signed-off-by: Nemo Yu <zyu379@wisc.edu>
@codspeed-hq

codspeed-hq Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 2 improved benchmarks
❌ 3 regressed benchmarks
✅ 1523 untouched benchmarks
⏩ 10 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation varbinview_large 112.9 µs 131.5 µs -14.17%
Simulation decompress_rd[f64, (100000, 0.01)] 845.9 µs 981.5 µs -13.82%
Simulation decompress_rd[f64, (100000, 0.1)] 845.9 µs 981.5 µs -13.82%
Simulation decompress_rd[f64, (100000, 0.0)] 1,024.6 µs 845.8 µs +21.14%
Simulation decompress_rd[f32, (100000, 0.0)] 586.8 µs 499.3 µs +17.53%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing nemo/geo-point (58febd5) with develop (f67b594)

Open in CodSpeed

Footnotes

  1. 10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Comment thread vortex-geo/src/extension/point.rs Outdated
Comment thread vortex-geo/src/extension/coordinate.rs Outdated
let DType::Struct(fields, _) = dtype else {
vortex_bail!("coordinate storage must be a Struct, was {dtype}");
};
let names: Vec<&str> = fields.names().iter().map(|n| n.as_ref()).collect();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why alloc?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. The names are now staged in a stack buffer inside from_field_names so the slice-pattern match still works, and coordinate_dimension zips names with fields directly instead of collecting.

vortex_ensure!(
matches!(
field,
DType::Primitive(PType::F64, Nullability::NonNullable)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that two fields are Nullable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

z/m are optional fields (per dimension), not nullable ones — the GeoArrow spec requires coordinate fields to be non-nullable, with "only the outer level allowed to have nulls". So a point can be missing entirely, but a present point can't have a null ordinate.

Ref: https://geoarrow.org/format.html

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh maybe you should make it Struct<x, y, ?z, ?m> instead, I forgot that ? also means possibly null, not just optional. Or you can do Struct<x, y, {z}, {m}>

HarukiMoriarty and others added 2 commits June 11, 2026 13:21
Co-authored-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Nemo Yu <83347615+HarukiMoriarty@users.noreply.github.com>
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
HarukiMoriarty and others added 2 commits June 12, 2026 10:54
…lability confusion

In Vortex DType display syntax a ? suffix means nullable, so z: f64?
read as a nullable field when the optional z/m fields are required to
be non-nullable f64 when present. Use usage-string brackets for
optional fields and anchor the storage shape to the four GeoArrow
dimensions XY, XYZ, XYM, XYZM.

Signed-off-by: "Nemo Yu" <zhenghong@spiraldb.com>
@HarukiMoriarty HarukiMoriarty enabled auto-merge (squash) June 12, 2026 15:02
@HarukiMoriarty HarukiMoriarty disabled auto-merge June 12, 2026 15:02
ScalarFnArray::try_new lost its len parameter in #8378 and now infers
the length from the first child, which matches the old behavior of
passing a.len().

DCO Remediation Commit for Nemo Yu <zyu379@wisc.edu>

I, Nemo Yu <zyu379@wisc.edu>, hereby add my Signed-off-by to this commit: 88cb5a2
I, Nemo Yu <zyu379@wisc.edu>, hereby add my Signed-off-by to this commit: 8789430
I, Nemo Yu <zyu379@wisc.edu>, hereby add my Signed-off-by to this commit: 549f5c4
I, Nemo Yu <zyu379@wisc.edu>, hereby add my Signed-off-by to this commit: ec95875
I, Nemo Yu <zyu379@wisc.edu>, hereby add my Signed-off-by to this commit: 874c171

Signed-off-by: Nemo Yu <zyu379@wisc.edu>
@HarukiMoriarty HarukiMoriarty merged commit 7b53625 into develop Jun 12, 2026
62 of 64 checks passed
@HarukiMoriarty HarukiMoriarty deleted the nemo/geo-point branch June 12, 2026 15:22
HarukiMoriarty added a commit that referenced this pull request Jun 12, 2026
<!--
Thank you for submitting a pull request! We appreciate your time and
effort.

Please make sure to provide enough information so that we can review
your pull
request. The Summary and Testing sections below contain guidance on what
to
include.
-->

## Summary

<!--
If this PR is related to a tracked effort, please link to the relevant
issue
here (e.g., `Closes: #123`). Otherwise, feel free to ignore / delete
this.

In this section, please:

1. Explain the rationale for this change.
2. Summarize the changes included in this PR.

A general rule of thumb is that larger PRs should have larger summaries.
If
there are a lot of changes, please help us review the code by explaining
what
was changed and why.

If there is an issue or discussion attached, there is no need to
duplicate all
the details, but clarity is always preferred over brevity.
-->

This PR adds support for import/export to Arrow for the `Point`
extension type, as the
`geoarrow.point` Arrow extension with separated (struct) coordinates.

Stacked on #8372.

<!--
## API Changes

Uncomment this section if there are any user-facing changes.

Consider whether the change affects users in one of the following ways:

1. Breaks public APIs in some way.
2. Changes the underlying behavior of one of the engine integrations.
3. Should some documentation be updated to reflect this change?

If a public API is changed in a breaking manner, make sure to add the
appropriate label.
-->

## Testing

<!--
Please describe how this change was tested. Here are some common
categories for
testing in Vortex:

1. Verifying existing behavior is maintained.
2. Verifying new behavior and functionality works correctly.
3. Serialization compatibility (backwards and forwards) should be
maintained or
   explicitly broken.
-->

Unit tests are added to exercise both code paths, plus a Vortex → Arrow
→ Vortex round-trip.

Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants