Skip to content

Fix leader-switch race in v_chunk_id IPC publish within create_shard#445

Closed
JacksonYao287 with Copilot wants to merge 2 commits into
stable/v4.xfrom
copilot/fix-code-review-comment
Closed

Fix leader-switch race in v_chunk_id IPC publish within create_shard#445
JacksonYao287 with Copilot wants to merge 2 commits into
stable/v4.xfrom
copilot/fix-code-review-comment

Conversation

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown

The v_chunk_id publish step in create_shard() used run_on_pg_leader(), which has a TOCTOU gap: if the leader changes between get_stats() and the lambda, nobody executes it and auxiliary_uint64_id stays at UINT64_MAX, causing the subsequent v_chunk_id equality assertions to flap.

Change

  • Replace run_on_pg_leader() with run_on_pg_leader_with_retry() for the v_chunk_id IPC publish step, using auxiliary_uint64_id != UINT64_MAX as the done_check — consistent with how create_shard itself was already fixed.
// Before
run_on_pg_leader(pg_id, [&]() {
    auto v_chunkID = _obj_inst->get_shard_v_chunk_id(shard_id);
    RELEASE_ASSERT(v_chunkID.has_value(), "failed to get shard v_chunk_id");
    g_helper->set_auxiliary_uint64_id(v_chunkID.value());
});

// After
run_on_pg_leader_with_retry(pg_id,
    [&] { return g_helper->get_auxiliary_uint64_id() != UINT64_MAX; },
    [&]() -> bool {
        auto v_chunkID = _obj_inst->get_shard_v_chunk_id(shard_id);
        RELEASE_ASSERT(v_chunkID.has_value(), "failed to get shard v_chunk_id");
        g_helper->set_auxiliary_uint64_id(v_chunkID.value());
        return true;
    });

Copilot AI changed the title [WIP] Fix code according to review comment Fix leader-switch race in v_chunk_id IPC publish within create_shard Jun 25, 2026
Copilot AI requested a review from JacksonYao287 June 25, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants