firefly: preserve all IoVec segments in intra-node LoopReq (#2686)#2687
Conversation
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
…tor#2686) The intra-node loopback delivery path truncated multi-segment sends, causing ProcessQueuesState::copyIoVec() to abort on assert(copied == len) when an Allgather-class collective runs with multiple ranks bound to a single endpoint NIC. Root cause ---------- For on-node peers, processSendLoop() packs the MatchHdr plus *every* data segment of the sender's I/O vector into a single vector (vec[0] = MatchHdr, vec[1..N] = data segments). The LoopReq constructor, however, kept only the first segment (vec[1]) and discarded vec[2..N], while the MatchHdr it carries still reports the *total* byte count across all segments. At delivery, copyIoVec() is asked to copy the full multi-segment length but only has the first segment as source, so copied stalls below len and the assertion fires (or, under NDEBUG, a short/incorrect receive buffer is produced silently). Multiple segments arise because Allgather::initIoVec() coalesces only physically contiguous chunks: a recursive-doubling stage whose send window wraps the modular buffer is split into two or more non-contiguous runs. This is why the failure reproduces reliably at high PPN (almost every neighbour is on-node) and even at 2 ranks/node with a large payload, but never for point-to-point or small single-segment collectives. Fix --- * LoopReq (ctrlMsgProcessQueuesState.h): copy every data segment (vec[1..N]) instead of only vec[1], so multi-segment intra-node sends are delivered intact. * copyIoVec (ctrlMsgProcessQueuesState.cc): harden as defence-in-depth. Bound the copy loops on `rV < dst.size()` in the loop conditions (instead of an assert that is compiled out under NDEBUG), and replace the bare assert(copied == len) with a dbg().fatal() that reports copied, len, src/dst segment counts and total byte counts. A mismatch is an internal defect between sender and receiver of the same collective algorithm, so it is surfaced loudly rather than silently truncated. Verified by reproducing the abort with the original code at numCores >= 8, then confirming clean completion after the fix across numCores = 2/8/16/56 and count = 1/131072 (Allgather). Fixes sstsimulator#2686
|
Just noticed that this PR is agains the master branch. Can you move it over to target devel? |
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; this inspection will remain valid until a new commit to source branch is performed. |
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SST__Autotest_sst-test_OMPI-4.1.4_PY3.9_sst-elements
Build InformationTest Name: SST__Autotest_sst-test_OMPI-4.1.4_PY3.9_sst-elements_Make-Dist
Build InformationTest Name: SST__Autotest_sst-test_OMPI-4.1.4_PY3.9_sst-elements_MT-2
Build InformationTest Name: SST__Autotest_sst-test_OMPI-4.1.4_PY3.9_sst-elements_MR-2
Build InformationTest Name: SST__Autotest_OSX-15-XC16_OMPI-4.1.6_PY3.10_sst-elements
Using Repos:
Pull Request Author: deanchester |
|
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: SST__Autotest_sst-test_OMPI-4.1.4_PY3.9_sst-elements
Build InformationTest Name: SST__Autotest_sst-test_OMPI-4.1.4_PY3.9_sst-elements_Make-Dist
Build InformationTest Name: SST__Autotest_sst-test_OMPI-4.1.4_PY3.9_sst-elements_MT-2
Build InformationTest Name: SST__Autotest_sst-test_OMPI-4.1.4_PY3.9_sst-elements_MR-2
Build InformationTest Name: SST__Autotest_OSX-15-XC16_OMPI-4.1.6_PY3.10_sst-elements
|
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ feldergast ]! |
|
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
5 similar comments
|
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
|
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
|
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
|
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
|
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
The intra-node loopback delivery path truncated multi-segment sends, causing ProcessQueuesState::copyIoVec() to abort on assert(copied == len) when an Allgather-class collective runs with multiple ranks bound to a single endpoint NIC.
Root cause
For on-node peers, processSendLoop() packs the MatchHdr plus every data segment of the sender's I/O vector into a single vector (vec[0] = MatchHdr, vec[1..N] = data segments). The LoopReq constructor, however, kept only the first segment (vec[1]) and discarded vec[2..N], while the MatchHdr it carries still reports the total byte count across all segments. At delivery, copyIoVec() is asked to copy the full multi-segment length but only has the first segment as source, so copied stalls below len and the assertion fires (or, under NDEBUG, a short/incorrect receive buffer is produced silently).
Multiple segments arise because Allgather::initIoVec() coalesces only physically contiguous chunks: a recursive-doubling stage whose send window wraps the modular buffer is split into two or more non-contiguous runs. This is why the failure reproduces reliably at high PPN (almost every neighbour is on-node) and even at 2 ranks/node with a large payload, but never for point-to-point or small single-segment collectives.
Fix
LoopReq (ctrlMsgProcessQueuesState.h): copy every data segment (vec[1..N]) instead of only vec[1], so multi-segment intra-node sends are delivered intact.
copyIoVec (ctrlMsgProcessQueuesState.cc): harden as defence-in-depth. Bound the copy loops on
rV < dst.size()in the loop conditions (instead of an assert that is compiled out under NDEBUG), and replace the bare assert(copied == len) with a dbg().fatal() that reports copied, len, src/dst segment counts and total byte counts. A mismatch is an internal defect between sender and receiver of the same collective algorithm, so it is surfaced loudly rather than silently truncated.Verified by reproducing the abort with the original code at numCores >= 8, then confirming clean completion after the fix across numCores = 2/8/16/56 and count = 1/131072 (Allgather).
Fixes #2686