Skip to content

[#11672] improvement(optimizer): optimize trigger-expr evaluation with table metadata & partition short-circuit#11664

Open
roryqi wants to merge 3 commits into
apache:mainfrom
qqqttt123:trigger-expr-optimization
Open

[#11672] improvement(optimizer): optimize trigger-expr evaluation with table metadata & partition short-circuit#11664
roryqi wants to merge 3 commits into
apache:mainfrom
qqqttt123:trigger-expr-optimization

Conversation

@roryqi

@roryqi roryqi commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Two related optimizations to the maintenance/optimizer recommender's trigger-expr / score-expr evaluation:

  1. Extend evaluation context with table metadata — the trigger-expr context now exposes column_count, partition_count, sort_order_count, and table properties (numeric values parsed to long, others kept as string), in addition to partition and table statistics. Both partitioned and non-partitioned tables now evaluate against partition statistics (when present), table statistics, and table metadata. The trigger-expr string representation is unchanged.
  2. Speed up partitioned-table evaluation (port of Pinterest gravitino-pinterest#249):
    • Short-circuit: evaluate the expression with table-level context only; if it resolves without referencing partition variables, skip the per-partition loop (relies on the QL engine's left-to-right && / || short-circuiting).
    • Precompute the table-level context once per initialize() instead of rebuilding it for every partition.
    • Cache compiled hyphen-to-underscore regex patterns in QLExpressionEvaluator to avoid Pattern.compile on every evaluation.
    • Adds ExpressionEvaluator#tryToEvaluateBool returning Optional<Boolean>.

Why are the changes needed?

Trigger expressions previously could only reference partition/table statistics, limiting the rules users can write. They also re-evaluated every partition even when a table-level expression already decided the outcome, which is costly for large partitioned tables.

Fixes #11672

Does this PR introduce any user-facing change?

No API changes. Trigger-expr authors gain new referenceable variables (column_count, partition_count, sort_order_count, and table properties).

How was this patch tested?

New/extended unit tests: TestTableMetadataTriggerExpressionUtils, TestQLExpressionEvaluator, and TestCompactionStrategyHandler. ./gradlew :maintenance:optimizer:test passes locally.

roryqi and others added 2 commits June 16, 2026 02:10
Extend the trigger-expr/score-expr evaluation context so that, in addition
to partition and table statistics, table metadata is available:
column_count, partition_count, sort_order_count, and table properties
(numeric values parsed to long, others kept as string).

Both partitioned and non-partitioned tables now evaluate against partition
statistics (when present), table statistics, and table metadata. The
representation of trigger-expr is unchanged (a string in the policy rules).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Port of Pinterest gravitino-pinterest#249:
1. Short-circuit partitioned-table evaluation: try the trigger expression
   with table-level context only (table stats + metadata + rules); if it
   resolves without referencing partition variables, skip the per-partition
   loop. Relies on the QL engine's left-to-right && / || short-circuiting.
2. Precompute the table-level context once per initialize() instead of
   rebuilding it for every partition.
3. Cache compiled hyphen-to-underscore regex patterns in
   QLExpressionEvaluator to avoid Pattern.compile on every evaluation.

Adds ExpressionEvaluator#tryToEvaluateBool returning Optional<Boolean>.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@roryqi roryqi requested review from bharos and yuqi1129 June 16, 2026 03:08
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

Code Coverage Report

Overall Project 73.14% +1.05% 🟢
Files changed 89.62% 🟢

Module Coverage
jobs 66.17% 🟢
optimizer 83.17% +0.57% 🟢
optimizer-api 21.95% 🔴
Files
Module File Coverage
optimizer TableMetadataTriggerExpressionUtils.java 100.0% 🟢
QLExpressionEvaluator.java 97.01% 🟢
BaseExpressionStrategyHandler.java 83.33% 🟢
ExpressionEvaluator.java 0.0% 🔴

@roryqi roryqi changed the title [MINOR] improvement(optimizer): optimize trigger-expr evaluation with table metadata & partition short-circuit [#11672] improvement(optimizer): optimize trigger-expr evaluation with table metadata & partition short-circuit Jun 16, 2026
.forEach(
(k, v) -> {
try {
context.put(k, Long.parseLong(v));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we use NumberUtils.isCreatable to replace this one.

return expressionEvaluator.tryToEvaluateBool(expression, context);
} catch (RuntimeException e) {
LOG.warn("Failed to evaluate expression '{}' with context {}", expression, context, e);
return Optional.of(false);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be Optional.of(null) according to the Java docs in expressionEvaluator.

- TableMetadataTriggerExpressionUtils: replace try/catch around
  Long.parseLong with NumberUtils.isCreatable, converting via
  createNumber(v).longValue() to preserve Long semantics.
- BaseExpressionStrategyHandler: return Optional.empty() (not
  Optional.of(false)) on evaluation failure, matching the
  ExpressionEvaluator#tryToEvaluateBool contract so the caller falls
  back to per-partition evaluation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Optimize optimizer trigger-expr evaluation with table metadata & partition short-circuit

2 participants