Skip to content

Fix UNWIND [null] causing count() to miscount null rows (#2383)#2411

Open
crprashant wants to merge 1 commit intoapache:masterfrom
crprashant:fix/2383-unwind-null-count
Open

Fix UNWIND [null] causing count() to miscount null rows (#2383)#2411
crprashant wants to merge 1 commit intoapache:masterfrom
crprashant:fix/2383-unwind-null-count

Conversation

@crprashant
Copy link
Copy Markdown
Contributor

Fixes #2383.

Problem

Per Neo4j / openCypher semantics, count() must ignore nulls. On Apache AGE:

UNWIND [null] AS x RETURN count(x)   -- returns 1 (buggy), should be 0

The control cases already returned the correct 0:

RETURN count(null)
WITH null AS x RETURN count(x)

Root cause

age_unnest() in src/backend/utils/adt/agtype.c materialized every top-level array element into a heap tuple with nulls[0] = false, even when the element's agtype_value.type == AGTV_NULL. The row that reached count() therefore carried a non-NULL datum wrapping an agtype JSON null, so the aggregate's strict null check never skipped it.

Fix

A naïve fix inside age_unnest() breaks other callers. construct_age_function_name() in src/backend/parser/cypher_expr.c rewrites unnestage_unnest, so list comprehensions and the predicate functions (all / any / none / single) also route through age_unnest. Those callers deliberately rely on the quirk that AGE's agtype_null > agtype_integer evaluates to agtype_null (truthy), not SQL NULL. Changing the SRF broke five predicate_functions expected results.

Introduce a dedicated SRF age_unwind() with UNWIND-specific null semantics:

  • Top-level AGTV_NULL elements → SQL NULL rows so null-strict operators (count, IS NULL, WHERE x IS NOT NULL, …) match WITH null AS x.
  • Non-null elements are returned unchanged.
  • Nested nulls inside arrays/objects are preserved as agtype-null.

transform_cypher_unwind() is the only call site changed from age_unnest to age_unwind. List comprehensions and predicate functions continue to use age_unnest and their behavior is untouched.

Before / after

-- before
UNWIND [null] AS x RETURN count(x)                  -- 1
UNWIND [1, null, 2, null, 3] AS x RETURN count(x)   -- 5
UNWIND [null] AS x RETURN x IS NULL                 -- false

-- after
UNWIND [null] AS x RETURN count(x)                  -- 0 ✓
UNWIND [1, null, 2, null, 3] AS x RETURN count(x)   -- 3 ✓
UNWIND [null] AS x RETURN x IS NULL                 -- true ✓

Regression tests (regress/sql/cypher_unwind.sql)

  • UNWIND [null] RETURN count(x) = 0
  • UNWIND [1, null, 2, null, 3] RETURN count(x) = 3
  • UNWIND [null] RETURN x IS NULL
  • UNWIND [1, null, 2] WITH x WHERE x IS NOT NULL RETURN count(x) = 2
  • UNWIND [null] RETURN count(*) = 1 (row still produced)
  • UNWIND [[null], {k: null}] RETURN count(x) = 2 (nested nulls preserved)
  • UNWIND [null] RETURN x — asserts the row value itself is SQL NULL

Files changed

  • src/backend/utils/adt/agtype.c — new age_unwind SRF
  • src/backend/parser/cypher_clause.c — UNWIND calls age_unwind
  • sql/agtype_typecast.sql — declare age_unwind
  • age--1.7.0--y.y.y.sql — add age_unwind for in-place upgrades
  • regress/sql/cypher_unwind.sql + regress/expected/cypher_unwind.out

Test results

make installcheck with PostgreSQL 18 on Ubuntu 24.04:

  • 32/33 tests pass.
  • age_upgrade fails identically on apache/age master HEAD (pre-existing, unrelated; same age_prepare_pg_upgrade already exists error).
  • cypher_unwind, predicate_functions, list_comprehension all pass.

@crprashant crprashant force-pushed the fix/2383-unwind-null-count branch from 33d6650 to 290105d Compare April 23, 2026 01:15
@crprashant
Copy link
Copy Markdown
Contributor Author

Heads-up for reviewers — just noticed @MuhammadTahaNaveed's #2406 fixes the same root cause (agtype AGTV_NULL not being emitted as SQL NULL from age_unnest) with a broader scope that also covers single() / all() / any() / none() three-valued semantics. That direction was my first instinct too, but I backed off after it altered the predicate_functions regression; #2406 goes further and reworks those predicates to match Neo4j three-valued logic, which is the right call. #2406 subsumes this PR.

Happy to close #2411 in favor of #2406 once it lands. The 7 regression cases in regress/sql/cypher_unwind.sql here specifically cover UNWIND [null] AS x RETURN count(x) = 0 (issue #2383) and a few UNWIND-only boundaries (nested lists, mixed nulls, DISTINCT over UNWIND) that #2406's tests don't exercise — glad to open a small follow-up PR porting just those tests on top of #2406 if that would be useful.

Thanks for the fast work on the null-propagation family of bugs.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Apache AGE’s UNWIND null-row semantics so count(expr) and other null-strict SQL operators correctly ignore top-level JSON null elements produced by UNWIND, aligning behavior with Neo4j/openCypher (issue #2383).

Changes:

  • Added a dedicated SRF age_unwind() that emits SQL NULL rows for top-level AGTV_NULL elements while preserving nested agtype nulls.
  • Updated Cypher UNWIND transformation to call age_unwind instead of routing through age_unnest.
  • Added regression coverage for UNWIND/count/IS NULL behavior and updated expected outputs; added SQL declarations for installation/upgrade.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/backend/utils/adt/agtype.c Introduces age_unwind() SRF with UNWIND-specific top-level null → SQL NULL behavior.
src/backend/parser/cypher_clause.c Switches UNWIND clause transformation to invoke age_unwind.
sql/agtype_typecast.sql Declares ag_catalog.age_unwind(agtype) RETURNS SETOF agtype.
age--1.7.0--y.y.y.sql Adds age_unwind to the upgrade template for in-place upgrades.
regress/sql/cypher_unwind.sql Adds regression queries validating UNWIND null semantics and strict-null operator behavior.
regress/expected/cypher_unwind.out Updates expected results for the new regression coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/utils/adt/agtype.c Outdated
Comment thread src/backend/utils/adt/agtype.c
Cypher's count() aggregate follows Neo4j/openCypher semantics and must
ignore nulls. Previously,

    UNWIND [null] AS x RETURN count(x)

returned 1 instead of 0 because age_unnest() materialized an agtype
JSON null as a non-NULL datum wrapping AGTV_NULL, so the count()
strictness check never skipped the row. The reference paths

    RETURN count(null)
    WITH null AS x RETURN count(x)

already returned 0.

Rather than change age_unnest() -- which is also reached by list
comprehensions and the predicate functions all/any/none/single via the
construct_age_function_name('unnest') -> 'age_unnest' rewrite and whose
callers deliberately rely on agtype-null flowing through -- introduce
a dedicated SRF age_unwind() with UNWIND-specific null semantics:

  * top-level AGTV_NULL elements are emitted as SQL NULL rows so
    null-strict operators (count, IS NULL, WHERE x IS NOT NULL, ...)
    behave like they do for WITH null AS x,
  * non-null elements are returned unchanged, and
  * nested nulls inside arrays/objects are preserved as agtype-null
    so container semantics are unchanged.

transform_cypher_unwind() now calls age_unwind(); list comprehensions
and predicate functions continue to go through age_unnest(), so their
existing AGE-specific null behavior is untouched.

Adds regression tests covering:
  * UNWIND [null] RETURN count(x) = 0
  * UNWIND [1, null, 2, null, 3] RETURN count(x) = 3
  * UNWIND [null] RETURN x IS NULL = true
  * UNWIND [null] RETURN x produces an SQL NULL row value
  * WHERE x IS NOT NULL filtering after UNWIND
  * count(*) still sees the row produced by UNWIND [null]
  * nested nulls inside containers survive unchanged

Fixes apache#2383.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@crprashant crprashant force-pushed the fix/2383-unwind-null-count branch from 290105d to 50b4253 Compare April 30, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UNWIND [null] may produce a row whose value prints as null, but count(x) still counts it as 1.

2 participants