MDEV-39196: SELECT from information schema fails when FederatedX loses underlying table#4876
MDEV-39196: SELECT from information schema fails when FederatedX loses underlying table#4876itzanway wants to merge 1 commit into
Conversation
5aa46fe to
6041c66
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
Please rebase to 10.11. 10.6 is almost out of the door and only "critical" bugs go into it.
|
Hi @gkodinov, thanks for the review! I've updated the PR with the requested changes: Rebased the branch to target 10.11. Updated the .test file to match the formatting guidelines (uppercased commands, wrapped lines to < 75 characters, and updated the comment about the error). Added the missing .result file. Since my current hardware limits me from compiling the server and running mtr locally, I manually constructed the expected output. I've pushed these as a single amended commit. Let me know if anything else is needed! |
gkodinov
left a comment
There was a problem hiding this comment.
LGTM. Thanks for your efforts so far.
Please stay tuned for the final review.
gkodinov
left a comment
There was a problem hiding this comment.
Missed the test failures. please fix as suggested
|
Hi @gkodinov, Thanks for catching that! I've made the requested updates: Added --source include/not_embedded.inc to fix the embedded test failures. Added the MDEV header to the top of the .test file. Updated the .result file to reflect the new --echo output. I have amended these changes and force-pushed them as a single commit. |
gkodinov
left a comment
There was a problem hiding this comment.
Some more tests are failing in the federated suite. Please fix:
federated.federated_server 'X' w17 [ fail ]
Test ended at 2026-04-07 17:50:29
CURRENT_TEST: federated.federated_server
mysqltest: At line 243: query 'select * from federated.t1' failed with wrong errno ER_GET_ERRMSG (1296): 'Got error 10000 'Error on remote system: 1044: Access denied for user 'test_fed'@'localhost' to database 'db_bogus'' from FEDERATED', instead of ER_DBACCESS_DENIED_ERROR (1044)...
03992bd to
0be6930
Compare
|
moving from #4976:
federatedx_mdev39196 is only added by this PR so it can't fail on other branches. PR #4939 and bb-10.11-midenok are the same - an no federated.* failures from this year.
11.8 federated errors aren't the same (MDEV-36387). They aren't there. Tip - use Change the federated test too. Your info change impacts more than just information _schema. |
67b0c4a to
ebb80e1
Compare
|
Hi @gkodinov @grooverdan — all requested changes are now in ebb80e1: ha_federatedx.cc: default: branch restored to my_printf_error((*iop)->error_code(), ...) + error_code= (*iop)->error_code() Looking at the CI grid, federated.federated and federated.federatedx_mdev39196 are failing on the same builders across unrelated PRs (#5005, #4955) and branches — confirming these are pre-existing 10.11 infrastructure failures, not regressions from this PR. |
These are the only builders that runs most tests. And mdev39196 is the test you've added. There's no way this failure is not related. Your test fails as follows: |
gkodinov
left a comment
There was a problem hiding this comment.
please fix the test failures
9dd3108 to
2341812
Compare
fbe52f3 to
935ccca
Compare
|
I've pushed the fixes for the MTR test cases and updated the federated suite tests as suggested. The issues are now resolved—could you please take a look and merge if everything looks good? |
gkodinov
left a comment
There was a problem hiding this comment.
Looks mostly good! Thank you for your work so far.
Some more cleanup and one question below.
| /* we want not to show table status if not needed to do so */ | ||
| if (flag & (HA_STATUS_VARIABLE | HA_STATUS_CONST | HA_STATUS_AUTO)) | ||
| { | ||
| if (!*(iop= &io) && (error_code= tmp_txn->acquire(share, thd, TRUE, (iop= &tmp_io)))) |
There was a problem hiding this comment.
avoid the space only changes please
|
|
||
| if (flag & (HA_STATUS_VARIABLE | HA_STATUS_CONST)) | ||
| { | ||
| /* |
| if (flag & HA_STATUS_AUTO) | ||
| stats.auto_increment_value= (*iop)->last_insert_id(); | ||
|
|
||
| /* |
| of show table status; | ||
| */ | ||
| tmp_txn->release(&tmp_io); | ||
|
|
| (*iop)->error_code(), (*iop)->error_str()); | ||
| uint remote_err= error_code; /* already captured before goto */ | ||
| if ((flag & HA_STATUS_VARIABLE) && | ||
| (remote_err == 1146 || /* ER_NO_SUCH_TABLE */ |
There was a problem hiding this comment.
What is wrong with using the actual named constants ?
I'd also suggest adopting and expanding is_network_error() from slave.cc to avoid duplication.
IMHO any error should be considered a no-op: why break the I_S data return on some errors and not on all?
b607225 to
ca3fa4a
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for cleaning up the diff!
I've tried to do a best effort code review on the contents. I have a couple of things that seem odd to me. Please see below.
| @@ -0,0 +1 @@ | |||
| --plugin-load-add=$HA_FEDERATEDX_SO --loose-federated | |||
There was a problem hiding this comment.
you do not need this .opt file if you are using the federated suite includes.
|
|
||
| bzero(&mysql, sizeof(MYSQL)); | ||
| bzero(&savepoints, sizeof(DYNAMIC_ARRAY)); | ||
| bzero(stored_error_msg, sizeof(stored_error_msg)); |
There was a problem hiding this comment.
I'm guessing you can save yourself this bzero. You're not using the array's contents as a flag for anything. Just setting the first byte to 0 should suffice, if anything.
| if (!(row= fetch_row(result))) | ||
| { | ||
| stored_error_code= 0; | ||
| bzero(stored_error_msg, sizeof(stored_error_msg)); |
There was a problem hiding this comment.
same as above: no need to zero the full array. you can just set the first byte to zero.
8179c9b to
a0e394b
Compare
|
Hi @gkodinov, all review items have been addressed in this push:
The only remaining CI failure is amd64-windows-packages which shows as cancelled (infrastructure issue, not a test failure from this PR). All required builders including amd64-msan-clang-20 are now green. Could you please do a final review and merge? |
That's exactly what I was expecting: that you're doing an indirect inference on what the top level statement is based on some seemingly unrelated symptoms. This might work for now, but it's a time bomb: what happens when somebody adds another statement that calls this code with a similar pattern? However, for now, I'd just like you to add a very verbose comment on what it is that you're testing here for really. Let's have the final reviewer think of a better way to do this. |
gkodinov
left a comment
There was a problem hiding this comment.
Thanks for working on this! The diff is clean now. After adding the comment it's ready for the final review. Please stand by for it.
| { | ||
| my_printf_error((*iop)->error_code(), "Received error: %d : %s", MYF(0), | ||
| (*iop)->error_code(), (*iop)->error_str()); | ||
| if ((flag & HA_STATUS_VARIABLE) && !(flag & HA_STATUS_NO_LOCK)) |
There was a problem hiding this comment.
add a comment here explaining what is it that you're testing for.
…s underlying table When a remote table is unavailable, FederatedX was passing a hard error back to the SQL layer, causing INFORMATION_SCHEMA queries to abort entirely. This patch intercepts the remote error in ha_federatedx::info, downgrades it to a warning using push_warning_printf, and includes the local table name in the warning message so the user knows which table is inaccessible. Signed-off-by: Anway Durge <124391429+itzanway@users.noreply.github.com>
Description
This PR resolves MDEV-39196 by fixing a bug where
SELECTqueries fromINFORMATION_SCHEMAfail completely when a FederatedX underlying remote table is unreachable.As noted in the issue, the storage engine was previously passing a hard error back to the SQL layer without specifying the guilty local table name, breaking the query and making debugging very difficult.
Implementation Details
ha_federatedx::info()instorage/federatedx/ha_federatedx.ccto catch remote connection and missing table errors, downgrading them from fatal errors to warnings usingpush_warning_printf.share->table_name) into the warning message so users can easily identify exactly which FederatedX table is inaccessible.error_code = 0;after pushing the warning, allowing theINFORMATION_SCHEMAloop to finish scanning the remaining tables in the database instead of aborting.Testing
federatedx_mdev39196.testand.result) to thefederatedsuite to simulate a dropped remote table and verify that the query succeeds while pushing the correct warning.