Enable user to take ownership of MPI#722
Conversation
…unction name to comm_getMpiComm
…ing for subcomm compiled
…for SubComm, because of course it enforces CXX
…declare that they take ownership of MPI
… updated setComm for init only workflow
…ything but it makes MPI library implementers less nervous
…ith user owned MPI
|
On the plus side, a distributed QFT benchmark circuit runs perfectly correctly: |
| */ | ||
| void initCustomQuESTEnv(int useDistrib, int useGpuAccel, int useMultithread); | ||
|
|
||
| void initCustomMpiQuESTEnv(int useDistrib, int userOwnsMpi, int useGpuAccel, int useMultithread); |
There was a problem hiding this comment.
Do we not want a new initialiser so its really clear where there is a different setup? Also this probably braked compatibility for older software.
There was a problem hiding this comment.
That is a new initialiser! This is effectively a C interface, so no overloading allowed.
There was a problem hiding this comment.
Apologies I had missread this I thought it was replacing initCustomQueSTEnv hence the concern. Once again reading comprehension issue must do better.
There was a problem hiding this comment.
The difference is more subtle than I might have liked, but the alternative would be an initialiser that was way too long!
There was a problem hiding this comment.
initCustomMpiCommQuESTEnv?
|
Looks good not run yet. My only comment is above and raises a wider point where we need to think about how we managed change in the programming interface as we probably want to stabilise as much as possible and only add to existing functionality. |
Agreed, If we're being proper about it, breaking changes to the API require a new major version release, and I don't think we have the time or the inclination currently. |
|
Remaining testing required:
|
iarejula-bsc
left a comment
There was a problem hiding this comment.
Overall, looks good to me, really like the idea! I was wondering if this design would allow running quregs in parallel, each with their own communicator (different envs for each one, maybe?), or if that would be impossible with the current implementation. I think that would be a highly interesting feature. As my team is trying to apply malleability to QuEST, that would be a huge win for us.
A few small comments below, but note this is an untested review.
…o avoid unexpected test failure due to range checking when compild in Debug configuration
Co-authored-by: iarejula-bsc <inigo.arejula@bsc.es>
We now check |
|
I note from some testing that the Edit: ye olde print debugging indicates the |
|
Sorry for my own delay! A horrifying week and another on the way, but I have most of today to rip through the outstanding stuff for v4.3 (🤞 🤞 )
I originally misinterpreted you and typed the below response - which I now realise you already understand. Kept it for posterity! I note the autodeployer already relaxes the requirement that all
Ooh that's a super interesting idea; similar to an automatic choosing of a subcommunicator for a
That would be the communicator that QuEST first uses if you do not override it (like right now, but ofc we don't use the world group), which includes all available nodes. Just a way for a user to stop setting their custom comm upon subsequently created
Oh interesting - so we would let users have "full MPI control" at QuESTEnv initialisation, but only limited MPI control of individual
Probably different comm
Ok that's fair enough! Especially if it all ends up in a dedicated MPI-specific header. Totally agreed the more general functionality ( When I get a chance, I will comb through the different diff with some nits 😉 |
Thanks Tyson! Good to know that we already have a concept of incompatible Quregs, so it wouldn't be a totally new concept within the codebase. @iarejula-bsc: Let us know how you would like to be recorded (by name or username) in the contribution file for your contribution here! |
|
Hi, i have been really busy and i didnt take a look over the last discussion, is there anything i should help?
You can use my full name: Íñigo Aréjula Aísa |
|
Would have been handy to know this is ready for review before being merged! Will note some minor comments now we can address in the future. Can we add an example in |
There was a problem hiding this comment.
I will address these when I get a sec, but here's some feedback! (marking my progress with 👀 )
(addressed most in #762)
| int isMultithreaded; | ||
| int isGpuAccelerated; | ||
| int isDistributed; | ||
| bool userOwnsMpi; |
There was a problem hiding this comment.
This should be an int, or the other fields updated to bool, since they are all on equal footing. The existing is* fields are always set to 0 or 1. Possibly there was confusion with the arguments of initCustomQuESTEnv() which accepts similarly named arguments as -1 in order for the auto-deployer to override them - that never reaches the QuESTEnv.
|
|
||
| #if QUEST_COMPILE_MPI && QUEST_COMPILE_SUBCOMM | ||
|
|
||
| #include <stdbool.h> |
There was a problem hiding this comment.
Why is the C bool being imported here? This is an internal C++ file and can use the inbuilt C++ bool
| } | ||
|
|
||
| // initialise QuEST around that communicator | ||
| initCustomMpiQuESTEnv(useDistrib, userOwnsMpi, useGpuAccel, useMultithread); |
There was a problem hiding this comment.
This function (initCustomMpiCommQuESTEnv) doesn't validate the input parameters, and is calling another API function (initCustomMpiQuESTEnv) which does validate. When the latter throws, it will use the latter's validation error message; __func__ will report the latter threw the error, not the former. So the user will see the name of a function they did not call
|
|
||
| #if QUEST_COMPILE_MPI | ||
| MPI_Comm comm_getMpiComm() { | ||
| return mpiCommQuest; |
There was a problem hiding this comment.
This should probably check that mpiCommQuest != MPI_COMM_NULL and otherwise throw an internal error, to protect against an internal caller prematurely accessing it, before it is overwritten from its default
| #if QUEST_COMPILE_MPI | ||
| #include <mpi.h> | ||
|
|
||
| static MPI_Comm mpiCommQuest = MPI_COMM_NULL; |
There was a problem hiding this comment.
This is a global singleton and should use the convention of beginning with global or global_.
Also, the Quest suffix seems redundant given it's a variable private to this internal QuEST function
| void error_commInvalidMpiComm() { | ||
|
|
||
| raiseInternalError("The supplied MPI communicator was MPI_COMM_NULL, or duplication failed."); | ||
| } |
There was a problem hiding this comment.
Duplication failure fine as an internal error, but we can check for MPI_COMM_NULL at user-input validation stage
| void error_commDoubleSetMpiComm() { | ||
|
|
||
| raiseInternalError("An attempt was made to set mpiCommQuest after it had already been set, as indicated by mpiCommQuest != MPI_COMM_NULL."); | ||
| } |
There was a problem hiding this comment.
Ideally caught by user validation
| REQUIRE( (env.isMultithreaded == 0 || env.isMultithreaded == 1) ); | ||
| REQUIRE( (env.isGpuAccelerated == 0 || env.isGpuAccelerated == 1) ); | ||
| REQUIRE( (env.isDistributed == 0 || env.isDistributed == 1) ); | ||
| REQUIRE( (env.userOwnsMpi == 0 || env.userOwnsMpi == 1) ); |
There was a problem hiding this comment.
This is pointless given userOwnsMpi is a bool
There was a problem hiding this comment.
Ah yes, artifact from when it wasn't!
|
|
||
| #include "quest/include/config.h" | ||
|
|
||
| #if QUEST_COMPILE_MPI && QUEST_COMPILE_SUBCOMM |
There was a problem hiding this comment.
I'm not thrilled about guarding the entire header, and the contextualising of exposing subcommunicator as "compiling". Semantics like "expose subcommunicator" might have been better, since all the internal trickery of "don't expose internal subcommunicator-specific functions to other TUs" is brittle when QUEST_COMPILE_MPI already safely guards exposure to MPI types.
This is a half-thought for a future me
| raiseInternalError("A function attempted to communicate via more messages than permitted (since there would be more uniquely-tagged messages than the tag upperbound)."); | ||
| } | ||
|
|
||
| void error_commDoubleSetMpiComm() { |
There was a problem hiding this comment.
This name is hard to parse (double?!), and asymmetric with the existing error_commAlreadyInit. Could be clarified to:
error_commAlreadyHasSetMpiComm`| * Advanced initialiser which lets the user positively declare that they take responsibility for MPI. | ||
| * This means we assume they have called MPI_Init, and that they will call MPI_Finalize. | ||
| */ | ||
| void initCustomMpiQuESTEnv(int useDistrib, bool userOwnsMpi, int useGpuAccel, int useMultithread); |
There was a problem hiding this comment.
Given the user needs to resolve MPI in order to own it, it seems fair to move this into the same "optional advanced-MPI user" module needed by subcommunicator. I think relegating this to an alternate header is better give this is anticipated to be tentatively, and is opaquely named. It can then live in a dedicated doc group, e.g. advanced_env, or similarly
Closes #712.
MPI_COMM_WORLDeverywhere. Instead it usesmpiCommQuest(which will very often just be a duplicate ofMPI_COMM_WORLD). This aligns with best practise for library developers, as it helps avoid avoid clashes with user MPI calls.mpiCommQueststuff was broken essentially any QuEST program would also be broken.MPI_COMM_WORLD. This interface is guarded behind compile-time flags as it requires bringing the MPI header into the interface, which means all consumers of libQuEST need to include MPI as well. This interface is not yet documented or tested. I think this is tolerable in the short term as it should only appeal to advanced users.Speaking of tests! Test results on my laptop:
and min_example output:
There is a caveat here: distributed tests on Cirrus keep failing, but this appears to be true of devel as well as this branch, so I'll do some more digging and raise it as a separate issue once I have a better of idea of what is going on.