Skip to content

Speed up native AST materialization#388

Merged
adamziel merged 8 commits intocodex/native-parser-object-grammar-cachefrom
codex/native-parser-bulk-materialization
Apr 30, 2026
Merged

Speed up native AST materialization#388
adamziel merged 8 commits intocodex/native-parser-object-grammar-cachefrom
codex/native-parser-bulk-materialization

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

@adamziel adamziel commented Apr 28, 2026

Summary

  • reuse one WP_MySQL_Parser instance inside the SQLite driver and reset its token stream per query
  • add reset_tokens() to the PHP parser polyfill and the Rust native parser
  • restore native parser-node accessor fast paths in WP_MySQL_Native_Parser_Node, while keeping PHP child materialization for mutation
  • fix the local native extension build helper for Nix/libclang bindgen by undefining __SSE2__ during binding generation

Stack

This is the top PR in the native MySQL lexer/parser stack. The stack is split so each GitHub diff shows one reviewable concern:

  1. #384 Extract MySQL lexer and parser polyfills

    • trunk -> codex/native-parser-php-facade
    • extraction-only PHP refactor
    • moves the existing PHP lexer/parser implementations into polyfill classes
    • keeps public WP_MySQL_Lexer and WP_MySQL_Parser as thin PHP subclasses
  2. #385 Add optional native parser routing

    • codex/native-parser-php-facade -> codex/native-parser-class-routing
    • adds fallback WP_MySQL_Native_* PHP classes
    • routes the public lexer/parser classes through native classes when the Rust extension provides them
    • adds the minimal PHP grammar-export bridge for the native parser
  3. #386 Add lazy native parser node facade

    • codex/native-parser-class-routing -> codex/native-parser-node-facade
    • keeps WP_Parser_Node as the plain PHP tree node
    • adds WP_MySQL_Native_Parser_Node extends WP_Parser_Node for native-backed lazy AST nodes
    • keeps native AST handles and native accessor delegation out of the base node class
  4. #381 Add lazy native AST facade

    • codex/native-parser-node-facade -> codex/native-lazy-ast-facade
    • implements the Rust lexer/parser extension and lazy native AST facade
    • makes the Rust extension instantiate WP_MySQL_Native_Parser_Node
    • adds native-extension CI coverage for the SQLite driver and WordPress PHPUnit tests
    • includes the local SQLite facade smoke benchmark
  5. #387 Cache native grammar on parser grammar object

    • codex/native-lazy-ast-facade -> codex/native-parser-object-grammar-cache
    • restores the object-attached native grammar cache
    • adds only WP_Parser_Grammar::$native_grammar on the PHP side
    • removes the Rust content-hash cache that walked the whole exported grammar on every parser construction
  6. This PR, #388 Speed up native AST materialization

    • codex/native-parser-object-grammar-cache -> codex/native-parser-bulk-materialization
    • optimizes native-to-PHP AST access after the grammar-cache performance restoration
    • reuses the SQLite driver's parser instance instead of constructing it per query

Why

The native lexer/parser itself is fast, but the PHP-facing path can lose that benefit if each query repeatedly rebuilds native parser state or forces full PHP AST materialization. On the current stack, #387 already removes the large grammar export/hash cost. This PR removes the remaining per-query parser construction churn and restores the native AST accessor path for descendant-heavy SQLite driver workloads.

Measurements

Environment: local PHP 8.2 via the native build helper, release Rust extension, current top of this PR.

Focused constructor/reset benchmark over 5000 unique SELECT queries:

Phase Time
native tokenize 22.62 us/query
fresh native parser constructor only 2.31 us/query
reusable parser reset_tokens() only 0.32 us/query
reusable parser reset + parse + get_descendants() 157.06 us/query
constructor/reset ratio 7.3x

The previously reported ~622 us/query constructor cost does not reproduce on this stack because #387 already caches the native grammar on the PHP grammar object. Parser reuse still removes most of the remaining constructor overhead.

SQLite facade smoke workload:

Command:

TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh
Workload PHP fallback Native extension Speedup
250 generated queries, including 1 x 2000-row insert 4.060s 0.525s 7.73x

Testing

  • cargo fmt --check
  • git diff --check
  • composer run check-cs
  • composer run test from packages/mysql-on-sqlite
  • php -d extension=packages/mysql-on-sqlite/ext/wp-mysql-parser/target/release/libwp_mysql_parser.so packages/mysql-on-sqlite/vendor/bin/phpunit -c packages/mysql-on-sqlite/phpunit.xml.dist
  • TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh

@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from b7a83a9 to a6f897d Compare April 29, 2026 00:46
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from eaa3b8c to 38ebab5 Compare April 29, 2026 00:46
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from a6f897d to 826dd24 Compare April 29, 2026 01:09
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from 38ebab5 to 00e0ac8 Compare April 29, 2026 01:11
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from 826dd24 to 74f1c3f Compare April 29, 2026 09:16
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from 00e0ac8 to 9925824 Compare April 29, 2026 09:17
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from 74f1c3f to 1bce38a Compare April 30, 2026 11:52
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from 4c45ada to b62b39b Compare April 30, 2026 11:52
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from 1bce38a to dcf5c96 Compare April 30, 2026 12:00
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from b62b39b to 859da96 Compare April 30, 2026 12:00
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from dcf5c96 to e9a0923 Compare April 30, 2026 12:16
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from 859da96 to 4bf2305 Compare April 30, 2026 12:16
@adamziel adamziel changed the base branch from codex/native-parser-object-grammar-cache to trunk April 30, 2026 12:22
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from 4bf2305 to a516b67 Compare April 30, 2026 12:24
@adamziel adamziel changed the base branch from trunk to codex/native-parser-object-grammar-cache April 30, 2026 12:24
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from f0bb626 to 56e2f94 Compare April 30, 2026 12:37
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from a516b67 to a0a093d Compare April 30, 2026 12:37
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from 56e2f94 to f333a57 Compare April 30, 2026 12:41
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from a0a093d to 780e349 Compare April 30, 2026 12:41
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from f333a57 to c1dc9b3 Compare April 30, 2026 12:43
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from 780e349 to 0ff98d2 Compare April 30, 2026 12:43
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from c1dc9b3 to 76aebba Compare April 30, 2026 12:51
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from 0ff98d2 to 3123e7a Compare April 30, 2026 12:51
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from 76aebba to b0af9dc Compare April 30, 2026 12:54
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from 3123e7a to 2371b7a Compare April 30, 2026 12:54
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from b0af9dc to 31a2778 Compare April 30, 2026 12:55
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from 2371b7a to b92dc18 Compare April 30, 2026 12:55
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from 31a2778 to 468d8ce Compare April 30, 2026 13:08
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from b92dc18 to c1f5074 Compare April 30, 2026 13:08
The previous helper name (has_unmaterialized_native_ast) implied a runtime
check for native-extension presence. It's actually a per-instance state
flag tracking whether this node's children have been copied into PHP.
was_mutated() reads that intent more directly.
When a native parser is in use, expose query results through a node
class that defers child materialization until callers actually walk the
tree. The base WP_Parser_Node::$children visibility is loosened to
protected so the facade can populate it on demand.
When a native parser is in use, expose query results through a node
class that defers child materialization until callers actually walk the
tree. The base WP_Parser_Node::$children visibility is loosened to
protected so the facade can populate it on demand.
When a native parser is in use, expose query results through a node
class that defers child materialization until callers actually walk the
tree. The base WP_Parser_Node::$children visibility is loosened to
protected so the facade can populate it on demand.
@adamziel adamziel force-pushed the codex/native-parser-object-grammar-cache branch from 468d8ce to 09b9c1a Compare April 30, 2026 13:18
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from c1f5074 to 57e5c3e Compare April 30, 2026 13:18
@adamziel adamziel force-pushed the codex/native-parser-bulk-materialization branch from 57e5c3e to 12e984f Compare April 30, 2026 13:50
@adamziel adamziel marked this pull request as ready for review April 30, 2026 14:07
@adamziel adamziel merged commit 076b7e5 into codex/native-parser-object-grammar-cache Apr 30, 2026
21 checks passed
@adamziel adamziel deleted the codex/native-parser-bulk-materialization branch April 30, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant