Skip to content

Fix parallel retrieve cursor check timeout never going quiet#1796

Open
gfphoenix78 wants to merge 1 commit into
apache:mainfrom
gfphoenix78:fix-cursor-timer
Open

Fix parallel retrieve cursor check timeout never going quiet#1796
gfphoenix78 wants to merge 1 commit into
apache:mainfrom
gfphoenix78:fix-cursor-timer

Conversation

@gfphoenix78
Copy link
Copy Markdown
Contributor

GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT was armed by every DECLARE PARALLEL RETRIEVE CURSOR but had no path to ever be disabled: the handler re-armed it unconditionally in the !DoingCommandRead branch, and no portal-drop path ever called disable_timeout(). The result was a SIGALRM firing every 10 seconds for the rest of the backend's life even when no parallel retrieve cursors remained, which interfered with unrelated code paths (e.g. truncating pg_usleep-based fault sleeps that tests rely on).

Tighten this in two ways:

  • The handler now only re-arms when GetNumOfParallelRetrieveCursors() is still positive, so the timer naturally quiets after the last cursor is gone.
  • PortalCleanup() proactively calls disable_timeout() when the last parallel retrieve cursor in this backend is being torn down, avoiding even the single stray firing that the handler fallback would otherwise still produce.

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


GP_PARALLEL_RETRIEVE_CURSOR_CHECK_TIMEOUT was armed by every DECLARE
PARALLEL RETRIEVE CURSOR but had no path to ever be disabled: the
handler re-armed it unconditionally in the !DoingCommandRead branch,
and no portal-drop path ever called disable_timeout(). The result was
a SIGALRM firing every 10 seconds for the rest of the backend's life
even when no parallel retrieve cursors remained, which interfered
with unrelated code paths (e.g. truncating pg_usleep-based fault
sleeps that tests rely on).

Tighten this in two ways:
  - The handler now only re-arms when GetNumOfParallelRetrieveCursors()
    is still positive, so the timer naturally quiets after the last
    cursor is gone.
  - PortalCleanup() proactively calls disable_timeout() when the
    last parallel retrieve cursor in this backend is being torn
    down, avoiding even the single stray firing that the handler
    fallback would otherwise still produce.
Comment thread src/backend/utils/init/postinit.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants