Skip to content

docs(adr): leader-priority raft config change basement#437

Open
xiaoxichen wants to merge 1 commit into
eBay:stable/v4.xfrom
xiaoxichen:xiaoxi-adr-leader-priority
Open

docs(adr): leader-priority raft config change basement#437
xiaoxichen wants to merge 1 commit into
eBay:stable/v4.xfrom
xiaoxichen:xiaoxi-adr-leader-priority

Conversation

@xiaoxichen

Copy link
Copy Markdown
Collaborator

Proposes the internal design for a CM-driven leader-priority change API: PG sb as source of truth for CM intent, a new HS_CTRL_SET_LEADER_PRIORITY journal entry to propagate intent to every replica, and an enhanced reconcile_leader as the convergence engine driving NuRaft cluster_config to match PG sb. Includes a defense-in-depth fix for the replace_member PG-sb divergence (on_pg_start_replace_member propagates out's priority to in to preserve leader intent across membership swaps).

Status: Proposed

Proposes the internal design for a CM-driven leader-priority change API:
PG sb as source of truth for CM intent, a new HS_CTRL_SET_LEADER_PRIORITY
journal entry to propagate intent to every replica, and an enhanced
reconcile_leader as the convergence engine driving NuRaft cluster_config
to match PG sb. Includes a defense-in-depth fix for the replace_member
PG-sb divergence (on_pg_start_replace_member propagates out's priority
to in to preserve leader intent across membership swaps).

Status: Proposed

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>

A new `journal_type_t::HS_CTRL_SET_LEADER_PRIORITY` value. The header payload
carries:
- `task_id` (string) — for caller-side idempotency and tracking, matching the

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need a task sb to persist intent?

| Failure point | Outcome |
| --- | --- |
| Leader crash before Piece 2 propose returns | CM retries (idempotent on `task_id`). No state change. |
| Leader crash after journal entry committed on majority but before on_commit handler runs on the leader | The entry is replayed during recovery; on_commit fires; new leader's reconcile_leader converges NuRaft. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The join_group is called after log replay, not sure whether on_commit triggering reconcile_leader works or not.

| --- | --- |
| Leader crash before Piece 2 propose returns | CM retries (idempotent on `task_id`). No state change. |
| Leader crash after journal entry committed on majority but before on_commit handler runs on the leader | The entry is replayed during recovery; on_commit fires; new leader's reconcile_leader converges NuRaft. |
| Leader crash between on_commit (Piece 3) and reconcile_leader completion (Piece 4) | PG sb is durable on majority; NuRaft cluster_config may be partly unconverged. Any subsequent reconcile_leader trigger (HTTP, CM retry, future periodic thread) re-derives desired state from PG sb and finishes the job. Transient inconsistency is bounded by reconcile trigger cadence. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we observe the pg sb priority and cluster_config priority?

4. Transient divergence between durable intent and live NuRaft state is bounded and
recoverable without operator intervention.

This basement is the substrate for a separate gRPC API (`CmLeaderChange` →

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we need to define a new gRPC, CmLeaderChange is used to notify CM's leader change, although it's not in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants