Skip to content

GH-50351: [C++] Fix KeyValueMetadata::DeleteMany crash with duplicate indices#50354

Open
AdvancedUno wants to merge 3 commits into
apache:mainfrom
AdvancedUno:gh-50351-deletemany-duplicate-indices
Open

GH-50351: [C++] Fix KeyValueMetadata::DeleteMany crash with duplicate indices#50354
AdvancedUno wants to merge 3 commits into
apache:mainfrom
AdvancedUno:gh-50351-deletemany-duplicate-indices

Conversation

@AdvancedUno

@AdvancedUno AdvancedUno commented Jul 3, 2026

Copy link
Copy Markdown

Rationale for this change

KeyValueMetadata::DeleteMany crashes (out-of-bounds write) when the indices vector
contains duplicates, e.g. DeleteMany({0, 0}). The compaction loop increments its shift
counter once per entry, so duplicates cause writes at negative offsets.

Additionally, out-of-range indices were only checked by DCHECK, so in release builds
DeleteMany({size}) silently deleted the last element and negative indices were UB.

What changes are included in this PR?

  • DeleteMany now validates indices after sorting and returns Status::IndexError for any index out of [0, size), matching the behavior and message format of Delete. Metadata is left unmodified on error.
  • Duplicate indices are removed with std::unique before compaction, so deleting the same index twice is equivalent to deleting it once.

Are these changes tested?

Yes, I added cases to KeyValueMetadataTest.Delete covering duplicate indices ({0,0}, {1,3,1,3,1}), out-of-bounds positive and negative indices (exact error messages), and that metadata is unchanged after a failed call.

Are there any user-facing changes?

Yes, minor behavioral fix: DeleteMany now returns IndexError for out-of-range indices
instead of crashing (duplicates) or silently corrupting state (release builds). No API change.

@AdvancedUno AdvancedUno requested a review from pitrou as a code owner July 3, 2026 06:43
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@AdvancedUno AdvancedUno changed the title Fix KeyValueMetadata::DeleteMany crash with duplicate indices GH-50351: [C++] Fix KeyValueMetadata::DeleteMany crash with duplicate indices Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant