Skip to content

MDEV-39261 MariaDB crash on startup in presence of indexed virtual columns#4953

Merged
Thirunarayanan merged 2 commits into10.11from
10.11-MDEV-39261
Apr 28, 2026
Merged

MDEV-39261 MariaDB crash on startup in presence of indexed virtual columns#4953
Thirunarayanan merged 2 commits into10.11from
10.11-MDEV-39261

Conversation

@Thirunarayanan
Copy link
Copy Markdown
Member

Problem:

A single InnoDB purge worker thread can process undo logs from different tables within the same batch. But get_purge_table(), open_purge_table() incorrectly assumes that a 1:1 relationship between a purge worker thread and a table within a single batch. Based on this wrong assumtion, InnoDB attempts to reuse TABLE objects cached in thd->open_tables for virtual column computation.

  1. Purge worker opens Table A and caches the TABLE pointer in thd->open_tables. 2) Same purge worker moves to Table B in the same batch, get_purge_table() retrieves the cached pointer for Table A instead of opening Table B. 3) Because innobase::open() is ignored for Table B, the virtual column template is never initialized.
  2. virtual column computation for Table B aborts the server

Solution:

  • Introduced purge_table class which has the following purge_table: Stores either TABLE* (for tables with indexed virtual columns) or MDL_ticket* (for tables without) in a single union using LSB as a flag.
    For tables with indexed virtual columns: opens TABLE*, accesses MDL_ticket* via TABLE->mdl_ticket
    For tables without indexed virtual columns: stores only MDL_ticket*.

trx_purge_attach_undo_recs(): Coordinator opens both dict_table_t* and TABLE* with proper MDL protection. Workers access cached table pointers from purge_node_t->tables without opening their own handles

purge_sys.coordinator_thd: Distinguish coordinator from workers in cleanup logic. Skip innobase_reset_background_thd() for coordinator thread to prevent premature table closure during batch processing. Workers still call cleanup to release their thread-local resources

trx_purge_close_tables():
Rewrite for purge coordinator thread

  1. Get MDL_tickets for the tables opened
  2. Call close_thread_tables() once for all TABLE* objects 3) Close the table and release the MDL

Added table->lock_mutex protection when reading (or) writing vc_templ->mysql_table and mysql_table_query_id. Clear cached TABLE* pointers before closing tables to prevent stale pointer access

Declared open_purge_table() and close_thread_tables() in trx0purge.cc Declared reset_thd() in row0purge.cc and dict0stats_bg.cc. Removed innobase_reset_background_thd()

@Thirunarayanan Thirunarayanan requested a review from dr-m April 17, 2026 11:53
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Thirunarayanan
Copy link
Copy Markdown
Member Author

Thirunarayanan commented Apr 17, 2026

It is 10.11 version of pr#4914

@Thirunarayanan Thirunarayanan force-pushed the 10.11-MDEV-39261 branch 2 times, most recently from 8b2b4dd to 3570558 Compare April 27, 2026 16:55
Copy link
Copy Markdown
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed with

diff -I^@@ -I^index <(git diff 9f6152b141c9feffbdb389acf412d0d01a1ed762{~2,}) <(git diff 357055866d52af7da0fe47f75c29bf6b2b0ad7f3{~2,})

that this is equivalent to #4914 which was already merged after my approval. The differences are necessitated by the following:

  • row_purge_reset_trx_id() was removed in ef2f3d2
  • dict_table_close() was refactored in 6e6a1b3
  • THD::mdl_context became directly accessible in 9aca89f

…lumns

Problem:
========
A single InnoDB purge worker thread can process undo logs from different
tables within the same batch. But get_purge_table(), open_purge_table()
incorrectly assumes that a 1:1 relationship between a purge worker thread
and a table within a single batch. Based on this wrong assumtion,
InnoDB attempts to reuse TABLE objects cached in thd->open_tables for
virtual column computation.

1) Purge worker opens Table A and caches the TABLE pointer in thd->open_tables.
2) Same purge worker moves to Table B in the same batch, get_purge_table()
retrieves the cached pointer for Table A instead of opening Table B.
3) Because innobase::open() is ignored for Table B, the virtual column
template is never initialized.
4) virtual column computation for Table B aborts the server

Solution:
========
- Introduced purge_table class which has the following
purge_table: Stores either TABLE* (for tables with indexed virtual
columns) or MDL_ticket* (for tables without) in a single union
using LSB as a flag.
For tables with indexed virtual columns: opens TABLE*, accesses
MDL_ticket* via TABLE->mdl_ticket
For tables without indexed virtual columns: stores only MDL_ticket*.

trx_purge_attach_undo_recs(): Coordinator opens both dict_table_t*
and TABLE* with proper MDL protection. Workers access cached
table pointers from purge_node_t->tables without opening
their own handles

purge_sys.coordinator_thd: Distinguish coordinator from workers
in cleanup logic. Skip innobase_reset_background_thd() for
coordinator thread to prevent premature table closure during
batch processing. Workers still call cleanup to release their
thread-local resources

trx_purge_close_tables():
Rewrite for purge coordinator thread
1) Close all dict_table_t* objects first
2) Call close_thread_tables() once for all TABLE* objects
3) Release MDL tickets last, after tables are closed

Added table->lock_mutex protection when reading (or) writing
vc_templ->mysql_table and mysql_table_query_id. Clear cached
TABLE* pointers before closing tables to prevent stale pointer
access

Declared open_purge_table() and close_thread_tables() in trx0purge.cc
Declared reset_thd() in row0purge.cc and dict0stats_bg.cc.
Removed innobase_reset_background_thd()
…lumns

Problem:
========
Purge threads computing virtual columns could crash due to:
1. Stale TABLE* pointers when tables are flushed/rebuilt during purge
2. open_purge_table() called close_thread_tables() on failure, making
MDL tickets invalid before purge could release them
3. Purge coordinator opened TABLE* but workers accessed it with wrong
TABLE->in_use
4. No retry mechanism when open_purge_table() failed due to concurrent
FLUSH TABLES, BACKUP STAGE, or ALTER TABLE operations

Solution:
========
1. Removed close_thread_tables() from open_purge_table(). Purge
coordinator thread should close explicitly in close_and_reopen()

2. Added retry logic: when open_purge_table() returns NULL due to
table flush/rebuild, set must_wait() flag and retry in
close_and_reopen()

3. Update close_and_reopen() with purge_table parameter to
close the failed table. Pass it to trx_purge_close_tables()
for proper cleanup

4. Properly set and reset TABLE::in_use during purge operations:
- Set to coordinator_thd in row_purge_parse_undo_rec() when opening
- Reset in trx_purge_close_table() when closing

5. The auto_increment initialization now happens unconditionally for
purge threads , ensuring the auto_increment counter is always
properly initialized when purge opens tables with virtual columns
@Thirunarayanan Thirunarayanan merged commit 0152c61 into 10.11 Apr 28, 2026
16 of 17 checks passed
@Thirunarayanan Thirunarayanan deleted the 10.11-MDEV-39261 branch April 28, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants