Skip to content

SDSTOR-22799 feat: chunk ownership tracking, remove GC shard header/footer I/O, blob commit validation#446

Open
Hooper9973 wants to merge 2 commits into
eBay:dev/refactor_create_seal_shardfrom
Hooper9973:SDSTOR-22799-opt
Open

SDSTOR-22799 feat: chunk ownership tracking, remove GC shard header/footer I/O, blob commit validation#446
Hooper9973 wants to merge 2 commits into
eBay:dev/refactor_create_seal_shardfrom
Hooper9973:SDSTOR-22799-opt

Conversation

@Hooper9973

Copy link
Copy Markdown
Contributor
  • ChunkSelector tracks which shard owns each chunk; select/release enforce ownership and reject wrong-owner calls

  • GC no longer writes physical shard header/footer blocks

  • Blob commit validates chunk state matches shard metadata

@Hooper9973 Hooper9973 force-pushed the SDSTOR-22799-opt branch 2 times, most recently from f8e31f5 to 1237535 Compare June 30, 2026 09:25
@codecov-commenter

codecov-commenter commented Jun 30, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 64.00000% with 54 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (dev/refactor_create_seal_shard@340788e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/lib/homestore_backend/gc_manager.cpp 62.68% 16 Missing and 9 partials ⚠️
src/lib/homestore_backend/heap_chunk_selector.cpp 62.16% 8 Missing and 6 partials ⚠️
src/lib/homestore_backend/hs_blob_manager.cpp 50.00% 7 Missing and 2 partials ⚠️
src/lib/homestore_backend/heap_chunk_selector.h 66.66% 4 Missing ⚠️
src/lib/homestore_backend/hs_shard_manager.cpp 85.71% 2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@                        Coverage Diff                        @@
##             dev/refactor_create_seal_shard     #446   +/-   ##
=================================================================
  Coverage                                  ?   52.63%           
=================================================================
  Files                                     ?       36           
  Lines                                     ?     5348           
  Branches                                  ?      672           
=================================================================
  Hits                                      ?     2815           
  Misses                                    ?     2236           
  Partials                                  ?      297           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ooter

I/O, blob commit validation

- ChunkSelector tracks which shard owns each chunk;
select/release enforce ownership and reject wrong-owner calls

- GC no longer writes physical shard header/footer blocks

- Blob commit validates chunk state matches shard metadata
@@ -292,9 +292,6 @@ class GCManager {
pdev_gc_metrics& metrics() { return metrics_; }

private:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this private

// 2 update the chunk state of move_from_chunk, now it is a reserved chunk
move_from_vchunk->m_pg_id = std::nullopt;
move_from_vchunk->m_v_chunk_id = std::nullopt;
move_from_vchunk->m_owner_shard_id = std::nullopt;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line will revert the change in line 361

@Hooper9973 Hooper9973 Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line(371) is to clear move_from_vchunk.
line 361 is to set move_to_vchunk.

}
bool res = chunk_selector_->recover_pg_chunks_states(pair.first, excluding_chunks);
// After restart, each chunk retains at most one shard log, because every
// seal-shard commit forces a cp flush.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we only flush dc_lsn, not trigger a cp flush as of now

Comment on lines +248 to +277
// check the blob's pchunk id and shard's pchunk id for this blob
if (blob_info.pbas.chunk_num() != sb_pchunk_id) {
BLOGE(tid, blob_info.shard_id, blob_info.blob_id,
"commit-time pchunk id mismatch: blob pchunk={} shard pchunk={}", blob_info.pbas.chunk_num(),
sb_pchunk_id);
RELEASE_ASSERT(false, "traceID={}, commit-time pchunk id mismatch: blob pchunk={} shard pchunk={}", tid,
blob_info.pbas.chunk_num(), sb_pchunk_id);
}
// check shard meta consistency
const auto ext_chunk = chunk_selector()->get_pg_vchunk(pg_id, sb_vchunk_id);
if (ext_chunk == nullptr) {
BLOGE(tid, blob_info.shard_id, blob_info.blob_id,
"chunk not found at commit time: pg={} v_chunk={} shard=0x{:x} blob={}", pg_id, sb_vchunk_id,
blob_info.shard_id, blob_info.blob_id);
RELEASE_ASSERT(false, "traceID={}, chunk not found at commit time: pg={} v_chunk={} shard=0x{:x} blob={}",
tid, pg_id, sb_vchunk_id, blob_info.shard_id, blob_info.blob_id);
} else if (!ext_chunk->valid() || ext_chunk->m_state != ChunkState::INUSE ||
ext_chunk->m_owner_shard_id != blob_info.shard_id || ext_chunk->get_chunk_id() != sb_pchunk_id) {
BLOGE(tid, blob_info.shard_id, blob_info.blob_id,
"commit-time chunk state invalid: pg={} v_chunk={} blob={} state={} owner=0x{:x} pchunk={} "
"sb_pchunk={}",
pg_id, sb_vchunk_id, blob_info.blob_id, ext_chunk->m_state, ext_chunk->m_owner_shard_id.value_or(0),
ext_chunk->get_chunk_id(), sb_pchunk_id);
RELEASE_ASSERT(
false,
"traceID={}, commit-time chunk state invalid: pg={} v_chunk={} blob={} state={} owner=0x{:x} "
"pchunk={} sb_pchunk={}",
tid, pg_id, sb_vchunk_id, blob_info.blob_id, ext_chunk->m_state,
ext_chunk->m_owner_shard_id.value_or(0), ext_chunk->get_chunk_id(), sb_pchunk_id);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better consolidate BLOGE and RELEASE_ASSERT into only one RELEASE_ASSERT for all the logs here

…ted. keep the chunk state clean(initialized)

- enhance GC chunk ownership handling, distinguish between normal and recovery
- CMakeList add properties to avoid running test in parallel
--override_config hs_backend_config.gc_scan_interval_sec=5
--gtest_filter=HomeObjectFixture.StalePChunkRouteAfterGC:HomeObjectFixture.StaleBlobRouteAfterSealAndGC)

# All tests share disk, port, and shared-memory resources, so run them serially to avoid parallel conflicts.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any issue without this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

I met one time on my env, two case runs in parallel.

LOGDEBUGMOD(gcmgr, "EGC recovery: set move_from_chunk={} owner to shard=0x{:x} for replay",
move_from_chunk, open_shard.value());
} else {
LOGWARNMOD(gcmgr, "EGC recovery: no open shard found in move_from_chunk={} or move_to_chunk={}",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we assert here? this is an unexpected case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants