improve raft test framework by JacksonYao287 · Pull Request #444 · eBay/HomeObject

JacksonYao287 · 2026-06-25T03:31:55Z

occasionally，we can see CI is stuck at homestore_test_pg/shard/blob。 the root cause is unexpected leader switch.
follower will wait for something to happen, but leader think it is not leader any more( because of leader switch) and do not schedule some op, then all the member will sync and wait at some point, and thus the UT is stuck.

This PR try to add more retry and avoid this case

codecov-commenter · 2026-06-25T04:12:02Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (stable/v4.x@b891e86). Learn more about missing BASE report.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@              Coverage Diff               @@
##             stable/v4.x     #444   +/-   ##
==============================================
  Coverage               ?   54.29%           
==============================================
  Files                  ?       36           
  Lines                  ?     5424           
  Branches               ?      684           
==============================================
  Hits                   ?     2945           
  Misses                 ?     2179           
  Partials               ?      300

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR aims to reduce replication-unit-test hangs caused by unexpected Raft leader switches by adding retry-aware “run on leader” helpers and updating test fixture operations to be resilient to leadership churn.

Changes:

Introduces run_on_pg_leader_with_retry and “not leader” error classification helpers to retry leader-only ops until completion or timeout.
Updates shard/blob test-fixture operations (create shard, seal shard, put/delete blobs) to use retry logic and idempotent completion checks.
Bumps Conan package version to 4.1.23.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/lib/homestore_backend/tests/homeobj_fixture.hpp`	Adds leader-retry helper + updates shard/blob fixture operations to tolerate leader switches and avoid deadlocks.
`conanfile.py`	Version bump to 4.1.23.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

xiaoxichen · 2026-06-25T05:24:06Z

TL, DR:
any possibility to use the ho api to reconcile leadership?

JacksonYao287 · 2026-06-25T07:48:25Z

it`s not related with reconciling leadership. what I want in this PR is that the op(for example, put_blob) is eventually be executed by a replica and will not be missed even if unexpected leader switch happens.

From another perspective, if three replicas all think they are not leader and stuck at waiting for some op(for example, blob_exist), who should schedule reconciling leadership and how to reconcile leadership(replicas are all stuck now).

JacksonYao287 requested a review from Copilot June 25, 2026 04:16

Copilot started reviewing on behalf of JacksonYao287 June 25, 2026 04:17 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread src/lib/homestore_backend/tests/homeobj_fixture.hpp Outdated

Comment thread src/lib/homestore_backend/tests/homeobj_fixture.hpp

JacksonYao287 requested a review from Copilot June 25, 2026 04:23

Copilot started reviewing on behalf of JacksonYao287 June 25, 2026 04:23 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread src/lib/homestore_backend/tests/homeobj_fixture.hpp Outdated

Comment thread src/lib/homestore_backend/tests/homeobj_fixture.hpp Outdated

Copilot AI mentioned this pull request Jun 25, 2026

Fix leader-switch race in v_chunk_id IPC publish within create_shard #445

Closed

JacksonYao287 force-pushed the improve-raft-test-framework branch from 491c05f to 63dcbed Compare June 25, 2026 06:12

JacksonYao287 marked this pull request as draft June 25, 2026 08:14

JacksonYao287 marked this pull request as ready for review June 25, 2026 09:19

xiaoxichen previously approved these changes Jun 26, 2026

View reviewed changes

improve raft test framework

66213d6

JacksonYao287 dismissed xiaoxichen’s stale review via 66213d6 June 29, 2026 10:06

JacksonYao287 force-pushed the improve-raft-test-framework branch from 63dcbed to 66213d6 Compare June 29, 2026 10:06

JacksonYao287 requested a review from xiaoxichen June 29, 2026 14:47

xiaoxichen approved these changes Jul 2, 2026

View reviewed changes

JacksonYao287 merged commit 67ecb3f into eBay:stable/v4.x Jul 3, 2026
25 checks passed

JacksonYao287 deleted the improve-raft-test-framework branch July 3, 2026 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

improve raft test framework#444

improve raft test framework#444
JacksonYao287 merged 1 commit into
eBay:stable/v4.xfrom
JacksonYao287:improve-raft-test-framework

JacksonYao287 commented Jun 25, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

xiaoxichen commented Jun 25, 2026

Uh oh!

JacksonYao287 commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

JacksonYao287 commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

xiaoxichen commented Jun 25, 2026

Uh oh!

JacksonYao287 commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JacksonYao287 commented Jun 25, 2026 •

edited

Loading

codecov-commenter commented Jun 25, 2026 •

edited

Loading