feat(vortex-geo): native Point extension type and GeoDistance scalar function#8372
Conversation
Adds a GeoArrow-style `Point` extension type (Struct<x,y,[z],[m]>, dimension-ready) and the planar `GeoDistance` scalar function between two point columns. Signed-off-by: Nemo Yu <zyu379@wisc.edu>
… point GeoDistance computes the planar distance from each point in a column to a single constant query point (e.g. `ST_Distance(column, point)`). The second operand must be a constant: it is decoded once and broadcast over the column rather than materialized to one identical row per output element. Column-to- column distance is unsupported and errors. `try_new_array` now infers the output length from the point column instead of taking it as an explicit parameter. Signed-off-by: Nemo Yu <zhenghong@spiraldb.com>
…field types Signed-off-by: Nemo Yu <zyu379@wisc.edu>
…s on construction Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | varbinview_large |
112.9 µs | 131.5 µs | -14.17% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.01)] |
845.9 µs | 981.5 µs | -13.82% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.1)] |
845.9 µs | 981.5 µs | -13.82% |
| ⚡ | Simulation | decompress_rd[f64, (100000, 0.0)] |
1,024.6 µs | 845.8 µs | +21.14% |
| ⚡ | Simulation | decompress_rd[f32, (100000, 0.0)] |
586.8 µs | 499.3 µs | +17.53% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing nemo/geo-point (58febd5) with develop (f67b594)
Footnotes
-
10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
| let DType::Struct(fields, _) = dtype else { | ||
| vortex_bail!("coordinate storage must be a Struct, was {dtype}"); | ||
| }; | ||
| let names: Vec<&str> = fields.names().iter().map(|n| n.as_ref()).collect(); |
There was a problem hiding this comment.
Removed. The names are now staged in a stack buffer inside from_field_names so the slice-pattern match still works, and coordinate_dimension zips names with fields directly instead of collecting.
| vortex_ensure!( | ||
| matches!( | ||
| field, | ||
| DType::Primitive(PType::F64, Nullability::NonNullable) |
There was a problem hiding this comment.
I thought that two fields are Nullable?
There was a problem hiding this comment.
z/m are optional fields (per dimension), not nullable ones — the GeoArrow spec requires coordinate fields to be non-nullable, with "only the outer level allowed to have nulls". So a point can be missing entirely, but a present point can't have a null ordinate.
There was a problem hiding this comment.
oh maybe you should make it Struct<x, y, ?z, ?m> instead, I forgot that ? also means possibly null, not just optional. Or you can do Struct<x, y, {z}, {m}>
Co-authored-by: Joe Isaacs <joe.isaacs@live.co.uk> Signed-off-by: Nemo Yu <83347615+HarukiMoriarty@users.noreply.github.com>
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
…lability confusion In Vortex DType display syntax a ? suffix means nullable, so z: f64? read as a nullable field when the optional z/m fields are required to be non-nullable f64 when present. Use usage-string brackets for optional fields and anchor the storage shape to the four GeoArrow dimensions XY, XYZ, XYM, XYZM. Signed-off-by: "Nemo Yu" <zhenghong@spiraldb.com>
7f564a9 to
874c171
Compare
ScalarFnArray::try_new lost its len parameter in #8378 and now infers the length from the first child, which matches the old behavior of passing a.len(). DCO Remediation Commit for Nemo Yu <zyu379@wisc.edu> I, Nemo Yu <zyu379@wisc.edu>, hereby add my Signed-off-by to this commit: 88cb5a2 I, Nemo Yu <zyu379@wisc.edu>, hereby add my Signed-off-by to this commit: 8789430 I, Nemo Yu <zyu379@wisc.edu>, hereby add my Signed-off-by to this commit: 549f5c4 I, Nemo Yu <zyu379@wisc.edu>, hereby add my Signed-off-by to this commit: ec95875 I, Nemo Yu <zyu379@wisc.edu>, hereby add my Signed-off-by to this commit: 874c171 Signed-off-by: Nemo Yu <zyu379@wisc.edu>
13001ef to
58febd5
Compare
<!-- Thank you for submitting a pull request! We appreciate your time and effort. Please make sure to provide enough information so that we can review your pull request. The Summary and Testing sections below contain guidance on what to include. --> ## Summary <!-- If this PR is related to a tracked effort, please link to the relevant issue here (e.g., `Closes: #123`). Otherwise, feel free to ignore / delete this. In this section, please: 1. Explain the rationale for this change. 2. Summarize the changes included in this PR. A general rule of thumb is that larger PRs should have larger summaries. If there are a lot of changes, please help us review the code by explaining what was changed and why. If there is an issue or discussion attached, there is no need to duplicate all the details, but clarity is always preferred over brevity. --> This PR adds support for import/export to Arrow for the `Point` extension type, as the `geoarrow.point` Arrow extension with separated (struct) coordinates. Stacked on #8372. <!-- ## API Changes Uncomment this section if there are any user-facing changes. Consider whether the change affects users in one of the following ways: 1. Breaks public APIs in some way. 2. Changes the underlying behavior of one of the engine integrations. 3. Should some documentation be updated to reflect this change? If a public API is changed in a breaking manner, make sure to add the appropriate label. --> ## Testing <!-- Please describe how this change was tested. Here are some common categories for testing in Vortex: 1. Verifying existing behavior is maintained. 2. Verifying new behavior and functionality works correctly. 3. Serialization compatibility (backwards and forwards) should be maintained or explicitly broken. --> Unit tests are added to exercise both code paths, plus a Vortex → Arrow → Vortex round-trip. Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Summary
This PR adds a native point type to
vortex-geo. Points are by far the most common geometry in analytical datasets, and a columnar representation makes their coordinates directly accessible without parsing WKB.It also adds the scalar function: point-to-point distance with PostGIS
ST_Distancesemantics (planar/Euclidean, results in CRS units).API Changes
Adds to
vortex-geo, all registered throughvortex_geo::initialize:Point(vortex.geo.point): a location stored asStruct<x, y, z?, m?>of non-nullablef64, wherez?is an optional elevation andm?an optional measure.Coordinate: the internal value a point scalar unpacks to.GeoDistance(vortex.geo.distance): per-row distance between two equal-length point columns; either or both operands may be constant, in which case the query point is decoded once and broadcast.Testing
Unit tests cover dtype validation for every GeoArrow dimension (and rejection of invalid storage), round-tripping a point column through scalar execution back to the original coordinates, WKT display for all four dimensions, and distance over all operand shapes: column-to-constant (either side), column-to-column, and constant-to-constant.
Supersedes #8342 (same change, moved from my fork to an in-repo branch).