[feature](catalog) Unify ES catalog scan through FILE_SCAN path, eliminating PluginDrivenEsScanNode#62602
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
… member, delete ES_SCAN enum ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#62602 Problem Summary: Address P2 code quality issues identified in PR review: - P2-A: Four inline `new ObjectMapper()` calls in PluginDrivenScanNode create unnecessary garbage; replaced with static OBJECT_MAPPER + MAP_TYPE_REF fields - P2-B: Unused `_es_properties` member in EsHttpReader.h wastes memory - P2-D: `ES_SCAN` enum value in ConnectorScanRangeType is unreferenced after ES was unified into FILE_SCAN path; deleted to avoid confusion - P2-E: Thrift comment for es_params listed wrong key names (`es.index` etc.); corrected to actual keys (`index`, `type`, `shard_id`, `host_port`, `es_hosts`) ### Release note None ### Check List (For Author) - Test: Regression test (all 7 ES suites pass) / FE + BE build verified - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9aea9da to
33a0e72
Compare
… member, delete ES_SCAN enum ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#62602 Problem Summary: Address P2 code quality issues identified in PR review: - P2-A: Four inline `new ObjectMapper()` calls in PluginDrivenScanNode create unnecessary garbage; replaced with static OBJECT_MAPPER + MAP_TYPE_REF fields - P2-B: Unused `_es_properties` member in EsHttpReader.h wastes memory - P2-D: `ES_SCAN` enum value in ConnectorScanRangeType is unreferenced after ES was unified into FILE_SCAN path; deleted to avoid confusion - P2-E: Thrift comment for es_params listed wrong key names (`es.index` etc.); corrected to actual keys (`index`, `type`, `shard_id`, `host_port`, `es_hosts`) ### Release note None ### Check List (For Author) - Test: Regression test (all 7 ES suites pass) / FE + BE build verified - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
…DrivenScanNode ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#62602 Problem Summary: PR apache#62602 unified ES catalog scanning through FILE_SCAN path (EsHttpReader) but lost the terminate_after limit pushdown optimization that existed in the old EsScanOperatorX. Without this optimization, queries like `SELECT * FROM es_table LIMIT 10` use scroll mode to fetch all data from ES instead of a single _search request with terminate_after=10, causing significant performance regression. ### Release note Restore ES terminate_after limit pushdown optimization: when a LIMIT clause is present, all predicates are pushed down to ES, and the limit fits within one batch, Doris now sends a single ES _search request with terminate_after instead of scrolling all results. EXPLAIN output shows "ES terminate_after: N" when this optimization is active. ### Check List (For Author) - Test: Manual test / Regression test needed - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run external |
203c4d3 to
329df2b
Compare
… member, delete ES_SCAN enum ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#62602 Problem Summary: Address P2 code quality issues identified in PR review: - P2-A: Four inline `new ObjectMapper()` calls in PluginDrivenScanNode create unnecessary garbage; replaced with static OBJECT_MAPPER + MAP_TYPE_REF fields - P2-B: Unused `_es_properties` member in EsHttpReader.h wastes memory - P2-D: `ES_SCAN` enum value in ConnectorScanRangeType is unreferenced after ES was unified into FILE_SCAN path; deleted to avoid confusion - P2-E: Thrift comment for es_params listed wrong key names (`es.index` etc.); corrected to actual keys (`index`, `type`, `shard_id`, `host_port`, `es_hosts`) ### Release note None ### Check List (For Author) - Test: Regression test (all 7 ES suites pass) / FE + BE build verified - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…DrivenScanNode ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#62602 Problem Summary: PR apache#62602 unified ES catalog scanning through FILE_SCAN path (EsHttpReader) but lost the terminate_after limit pushdown optimization that existed in the old EsScanOperatorX. Without this optimization, queries like `SELECT * FROM es_table LIMIT 10` use scroll mode to fetch all data from ES instead of a single _search request with terminate_after=10, causing significant performance regression. ### Release note Restore ES terminate_after limit pushdown optimization: when a LIMIT clause is present, all predicates are pushed down to ES, and the limit fits within one batch, Doris now sends a single ES _search request with terminate_after instead of scrolling all results. EXPLAIN output shows "ES terminate_after: N" when this optimization is active. ### Check List (For Author) - Test: Manual test / Regression test needed - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
… member, delete ES_SCAN enum ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#62602 Problem Summary: Address P2 code quality issues identified in PR review: - P2-A: Four inline `new ObjectMapper()` calls in PluginDrivenScanNode create unnecessary garbage; replaced with static OBJECT_MAPPER + MAP_TYPE_REF fields - P2-B: Unused `_es_properties` member in EsHttpReader.h wastes memory - P2-D: `ES_SCAN` enum value in ConnectorScanRangeType is unreferenced after ES was unified into FILE_SCAN path; deleted to avoid confusion - P2-E: Thrift comment for es_params listed wrong key names (`es.index` etc.); corrected to actual keys (`index`, `type`, `shard_id`, `host_port`, `es_hosts`) ### Release note None ### Check List (For Author) - Test: Regression test (all 7 ES suites pass) / FE + BE build verified - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…DrivenScanNode ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#62602 Problem Summary: PR apache#62602 unified ES catalog scanning through FILE_SCAN path (EsHttpReader) but lost the terminate_after limit pushdown optimization that existed in the old EsScanOperatorX. Without this optimization, queries like `SELECT * FROM es_table LIMIT 10` use scroll mode to fetch all data from ES instead of a single _search request with terminate_after=10, causing significant performance regression. ### Release note Restore ES terminate_after limit pushdown optimization: when a LIMIT clause is present, all predicates are pushed down to ES, and the limit fits within one batch, Doris now sends a single ES _search request with terminate_after instead of scrolling all results. EXPLAIN output shows "ES terminate_after: N" when this optimization is active. ### Check List (For Author) - Test: Manual test / Regression test needed - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
329df2b to
3689a41
Compare
… scan path Issue Number: close #xxx Related PR: #xxx Problem Summary: Add Thrift definitions to support routing ES scans through the unified FILE_SCAN_NODE path (same as JDBC). This is the foundation for eliminating the separate PluginDrivenEsScanNode/ES_HTTP_SCAN_NODE code path. Changes: - Add FORMAT_ES_HTTP = 19 to TFileFormatType enum - Add es_params (field 12) to TTableFormatFileDesc for per-shard ES parameters - Add es_properties (field 31), es_docvalue_context (field 32), and es_fields_context (field 33) to TFileScanRangeParams for per-node ES parameters None - Test: FE and BE compilation verified - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…path ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: Create EsHttpReader, a native C++ GenericReader subclass that wraps existing ESScanReader and ScrollParser to support ES scanning through the unified FILE_SCAN_NODE path (same path as JDBC, Parquet, etc.). Key design points: - Extends GenericReader directly (no JNI overhead) - Reuses existing ESScanReader (HTTP scroll API) and ScrollParser (JSON parsing) - Reads ES parameters from TFileScanRangeParams (es_properties, es_docvalue_context, es_fields_context) and TTableFormatFileDesc (es_params) - Builds query DSL via ESScrollQueryBuilder::build() in init_reader() - fill_all_columns() returns true (fills all columns from ES response) ### Release note None ### Check List (For Author) - Test: BE compilation verified - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…patch Issue Number: close #xxx Related PR: #xxx Problem Summary: Register the EsHttpReader in FileScanner::_get_next_reader() dispatch so that FORMAT_ES_HTTP scan ranges are handled by the unified FILE_SCAN path. Also fixes forward declarations and destructor visibility for EsHttpReader. Changes: - Add FORMAT_ES_HTTP case in FileScanner::_get_next_reader() switch - Add es_http_reader.h include to file_scanner.cpp - Fix forward declaration tags (class vs struct) for TFileScanRangeParams - Move EsHttpReader destructor to .cpp (unique_ptr needs complete type) None - Test: BE compilation verified - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pushdown metadata ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: The existing ConnectorScanPlanProvider.getScanNodeProperties() returns only a Map<String,String>, forcing connectors like ES to encode filter pushdown metadata (not-pushed conjunct indices) as serialized strings in the properties map. This is fragile and couples the engine to a magic key convention. This commit introduces: 1. ScanNodePropertiesResult - a typed wrapper that carries both the properties map and a Set<Integer> of not-pushed conjunct indices 2. ConnectorScanPlanProvider.getScanNodePropertiesResult() - a new default method that wraps getScanNodeProperties() for backward compatibility Connectors that perform fine-grained conjunct pushdown (ES query DSL) can override getScanNodePropertiesResult() to return structured pushdown results. The engine (PluginDrivenScanNode) will call this method instead of parsing magic string keys. ### Release note None ### Check List (For Author) - Test: No need to test - SPI interface addition with default implementation - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ilter pushdown
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
ES connector used ConnectorScanRangeType.ES_SCAN which required a separate
PluginDrivenEsScanNode and distinct Thrift path (TEsScanRange / ES_HTTP_SCAN_NODE).
This prevented unification with the JDBC/file-based scan path.
This commit changes EsScanRange and EsScanPlanProvider to route through FILE_SCAN:
1. EsScanRange: getRangeType() returns FILE_SCAN, adds getPath() ("es://index/shard"),
getTableFormatType() ("es"), getFileFormat() ("es_http")
2. EsScanPlanProvider: getScanRangeType() returns FILE_SCAN, adds file_format_type=es_http
to properties for PluginDrivenScanNode.getFileFormatType() mapping
3. Replaces string-serialized _not_pushed_conjunct_indices with structured
getScanNodePropertiesResult() returning ScanNodePropertiesResult with Set<Integer>
4. Updates test to expect FILE_SCAN instead of ES_SCAN
### Release note
None
### Check List (For Author)
- Test: Regression test (unit test updated)
- Behavior changed: No (ES connector not yet wired through PluginDrivenScanNode)
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…compatible keys ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: After changing EsScanRange.getProperties() to use BE-compatible keys (index, type, shard_id, host_port), the PROP_* constants and unit tests still referenced the old es.* prefixed keys, causing EsNodeInfoAndScanRangeTest to fail. Updated PROP_INDEX/PROP_TYPE/PROP_SHARD_ID to match new key names, replaced PROP_HOSTS with PROP_HOST_PORT, and fixed all test assertions. ### Release note None ### Check List (For Author) - Test: Unit Test (EsNodeInfoAndScanRangeTest passes) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: Previously EsScanRange only sent the first ES host (host_port) to BE, losing the full list of shard replica hosts. This prevented BE from selecting a local ES node for data locality, and removed failover capability when a single ES node is down. The old BE code in es_scan_operator.cpp had a get_host_and_port() function that preferred the local host via BackendOptions::get_localhost() from a list of candidate hosts. The new unified scan path was missing this locality-aware selection. ### Changes FE side: - Added PROP_ES_HOSTS constant to EsScanRange - getProperties() now includes "es_hosts" key with comma-separated full host list alongside the existing "host_port" (first host as fallback) - Updated unit test to assert PROP_ES_HOSTS value BE side: - Added _select_host() method to EsHttpReader that parses the es_hosts comma-separated list and prefers the host matching the local backend via BackendOptions::get_localhost() - Added KEY_ES_HOSTS static constant - init_reader() now calls _select_host() instead of directly using host_port, then writes the selected host back to properties ### Release note None ### Check List (For Author) - Test: Unit Test (EsNodeInfoAndScanRangeTest) / Regression test (all 7 ES suites pass) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Issue Number: close #xxx Related PR: #xxx Problem Summary: Two issues in PluginDrivenScanNode: 1. Double computation: pruneConjunctsFromNodeProperties() called getScanNodePropertiesResult() which recomputed properties from scratch, while getOrLoadScanNodeProperties() had its own separate cache via getScanNodeProperties(). Both methods built column handles and filters redundantly. 2. Conjunct index mismatch: buildRemainingFilter() filters out CAST expressions when the connector does not support CAST pushdown. The connector then returns not-pushed indices relative to the filtered list, but pruneConjunctsFromNodeProperties() used those indices against the original conjuncts list, which could cause wrong conjuncts to be pruned. - Added getOrLoadPropertiesResult() that caches a single ScanNodePropertiesResult, used by both getOrLoadScanNodeProperties() and pruneConjunctsFromNodeProperties() - Added filteredToOriginalIndex mapping in buildRemainingFilter() that tracks which original conjunct indices correspond to the filtered (CAST-removed) list - pruneConjunctsFromNodeProperties() now translates connector indices back to original indices via the mapping, and retains CAST conjuncts that were never sent to the connector None - Test: Regression test (all 7 ES suites pass) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… member, delete ES_SCAN enum ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#62602 Problem Summary: Address P2 code quality issues identified in PR review: - P2-A: Four inline `new ObjectMapper()` calls in PluginDrivenScanNode create unnecessary garbage; replaced with static OBJECT_MAPPER + MAP_TYPE_REF fields - P2-B: Unused `_es_properties` member in EsHttpReader.h wastes memory - P2-D: `ES_SCAN` enum value in ConnectorScanRangeType is unreferenced after ES was unified into FILE_SCAN path; deleted to avoid confusion - P2-E: Thrift comment for es_params listed wrong key names (`es.index` etc.); corrected to actual keys (`index`, `type`, `shard_id`, `host_port`, `es_hosts`) ### Release note None ### Check List (For Author) - Test: Regression test (all 7 ES suites pass) / FE + BE build verified - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ization ### What problem does this PR solve? Problem Summary: PluginDrivenScanNode contains ~350 lines of format-specific Thrift translation code dispatched via if-else chains. This violates OCP and makes the class a "god class". To fix this, we add SPI interface methods that let each connector handle its own Thrift param construction, EXPLAIN output, and table serialization. This commit adds 4 new default methods to the SPI interfaces: - ConnectorScanRange.populateRangeParams() - per-range Thrift construction - ConnectorScanPlanProvider.populateScanLevelParams() - scan-level Thrift params - ConnectorScanPlanProvider.appendExplainInfo() - connector EXPLAIN output - ConnectorScanPlanProvider.getSerializedTable() - serialized table for BE All methods have default implementations preserving current behavior. No existing logic is changed. ### Release note None ### Check List (For Author) - Test: No need to test - pure interface additions with default implementations - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…o, Hive connectors ### What problem does this PR solve? Problem Summary: As part of P2-F SPI anti-bloat refactoring, each connector needs its own populateRangeParams() override so that format-specific Thrift construction logic lives in the connector module rather than in PluginDrivenScanNode (god class pattern). This commit adds: - EsScanRange.populateRangeParams(): sets es_params on TTableFormatFileDesc - MaxComputeScanRange.populateRangeParams(): constructs TMaxComputeFileDesc and mutates rangeDesc path/offset/size - TrinoScanRange.populateRangeParams(): constructs TTrinoConnectorFileDesc with JSON options parsing - HiveScanRange.populateRangeParams(): no-op for plain hive, constructs TTransactionalHiveDesc for transactional_hive - EsScanPlanProvider.populateScanLevelParams(): builds es_properties map, deserializes docvalue_context and fields_context JSON - EsScanPlanProvider.appendExplainInfo(): ES-specific EXPLAIN output - Added fe-thrift (provided scope) and fe-connector-api deps to Hive, MaxCompute, Trino connector pom.xml files ### Release note None ### Check List (For Author) - Test: FE compilation verified - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ectors ### What problem does this PR solve? Problem Summary: Continue P2-F SPI anti-bloat refactoring for the two most complex connectors: Hudi (format downgrade logic) and Paimon (deletion file, count pushdown, JNI/native branching). This commit adds: - HudiScanRange.populateRangeParams(): dynamic JNI→native format downgrade, full JNI metadata (instant_time, serde, delta_logs etc.), partition values - PaimonScanRange.populateRangeParams(): JNI vs native branching, deletion file sub-struct, row count pushdown, partition values with null tracking - PaimonScanPlanProvider.populateScanLevelParams(): predicate + options_json - PaimonScanPlanProvider.getSerializedTable(): returns serialized_table prop - Added fe-thrift (provided) and fe-connector-api deps to Hudi and Paimon connector pom.xml files ### Release note None ### Check List (For Author) - Test: FE compilation verified - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…connector SPI ### What problem does this PR solve? Problem Summary: PluginDrivenScanNode contained ~370 lines of format-specific Thrift construction code (setMaxComputeParams, setTrinoConnectorParams, setHiveParams, setTransactionalHiveParams, setHudiParams, setPaimonParams, setEsParams, setPaimonScanLevelParams, setEsScanLevelParams, copyIfPresent) plus ES-specific EXPLAIN logic and Paimon-specific getSerializedTable. This "god class" pattern violated separation of concerns and made adding new connectors require modifying fe-core. ### Release note None ### Check List (For Author) - Test: FE compilation verified (sh build.sh --fe) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve? Problem Summary: After rebasing onto latest master, GenericReader was refactored with Template Method pattern: get_next_block is now non-virtual (calls _do_get_next_block), get_columns is now non-virtual (calls _get_columns_impl), and fill_all_columns was removed. EsHttpReader still used the old virtual method signatures causing compilation failure. ### Release note None ### Check List (For Author) - Test: BE compilation verified, all 7 ES regression tests pass - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…format/table/es/ ### What problem does this PR solve? Problem Summary: After unifying ES scans through FILE_SCAN_NODE, the old ES_HTTP_SCAN_NODE scan path (EsScanner, EsScanOperatorX, pipeline registration) became dead code. Additionally, ES-related reading code was scattered across be/src/exec/es/ and be/src/format/table/. Changes: - Deleted old scan path: EsScanner, EsScanOperatorX (4 files, ~520 lines) - Removed ES_SCAN_NODE/ES_HTTP_SCAN_NODE case from pipeline_fragment_context - Removed EsScanLocalState template instantiations and operator declarations - Moved all ES reading code (ESScanReader, ScrollParser, ESScrollQueryBuilder, EsHttpReader) from be/src/exec/es/ and be/src/format/table/ to be/src/format/table/es/ - Updated all include paths accordingly ### Release note None ### Check List (For Author) - Test: BE compilation verified, all 7 ES regression tests pass - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…DrivenScanNode ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#62602 Problem Summary: PR apache#62602 unified ES catalog scanning through FILE_SCAN path (EsHttpReader) but lost the terminate_after limit pushdown optimization that existed in the old EsScanOperatorX. Without this optimization, queries like `SELECT * FROM es_table LIMIT 10` use scroll mode to fetch all data from ES instead of a single _search request with terminate_after=10, causing significant performance regression. ### Release note Restore ES terminate_after limit pushdown optimization: when a LIMIT clause is present, all predicates are pushed down to ES, and the limit fits within one batch, Doris now sends a single ES _search request with terminate_after instead of scrolling all results. EXPLAIN output shows "ES terminate_after: N" when this optimization is active. ### Check List (For Author) - Test: Manual test / Regression test needed - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ate_after) ### What problem does this PR solve? Problem Summary: Adds regression tests to verify the ES terminate_after limit pushdown optimization works correctly after the fix in PluginDrivenScanNode. Tests cover both ES7 and ES8 catalogs. ### Release note None ### Check List (For Author) - Test: Regression test (PASSED - all cases pass against live ES7/ES8) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Restore missing RuntimeProfile counters for EsHttpReader so ES HTTP scans expose read, materialize, batch, and row metrics.
### Release note
None
### Check List (For Author)
- Test: BE build and unit test
- Unit Test: ./run-be-ut.sh --run --filter=EsHttpReaderTest.* -j 8
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Fix Elasticsearch connector IPv6 host parsing and formatting across FE scan planning and BE host selection so IPv6 nodes are preserved correctly.
### Release note
Fix ES connector host parsing for IPv6 addresses in FE scan planning and BE host selection.
### Check List (For Author)
- Test: BE build and FE build with targeted unit tests
- Unit Test: ./run-be-ut.sh --run --filter=EsHttpReaderTest.* -j 8
- Unit Test: cd fe/fe-connector/fe-connector-es && mvn -q -Dtest=EsNodeInfoAndScanRangeTest,EsScanPlanProviderTest test
- Manual test: ./build.sh --fe
- Behavior changed: Yes (ES connector now preserves IPv6 literals and formats IPv6 host:port values with brackets.)
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Fix the ES connector explain property key mismatch so EXPLAIN output includes the Elasticsearch index name again.
### Release note
Fix EXPLAIN output for ES scans to display the index name.
### Check List (For Author)
- Test: FE build and targeted unit test
- Unit Test: cd fe/fe-connector/fe-connector-es && mvn -q -Dtest=EsScanPlanProviderTest test
- Manual test: ./build.sh --fe
- Behavior changed: Yes (EXPLAIN for ES scans now shows the ES index name.)
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Remove the short-lived provider-local ES metadata cache, which does not provide real reuse on the current call path and couples query-scoped field context to an index-only cache key.
### Release note
None
### Check List (For Author)
- Test: FE build and targeted unit test
- Unit Test: cd fe/fe-connector/fe-connector-es && mvn -q -Dtest=EsScanPlanProviderTest test
- Manual test: ./build.sh --fe
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3689a41 to
72c0ece
Compare
|
run buildall |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #62183
Problem Summary:
Before this change, ES catalog queries went through a dedicated
PluginDrivenEsScanNode→ES_HTTP_SCAN_NODE→EsScannercode path,while JDBC catalogs used
PluginDrivenScanNode→FILE_SCAN_NODE→FileScanner. The two paths had completely separate plan-generation logic,Thrift structures, and BE execution flows, making the connector SPI harder
to maintain and extend.
This PR routes ES catalog scans through the same
FILE_SCAN_NODEpath thatJDBC already uses, achieving a single unified scan-node implementation for
all plugin-driven connectors. The key changes are:
Thrift layer
FORMAT_ES_HTTP = 19toTFileFormatTypees_params(per-shard map) toTTableFormatFileDesces_properties,es_docvalue_context,es_fields_contexttoTFileScanRangeParams(shared across shards)BE — EsHttpReader (new, native C++)
GenericReadersubclass inbe/src/format/table/es_http_reader.{h,cpp}ESScanReader(HTTP scroll) andScrollParser(JSON→columns)FileScanner::_get_next_reader()forFORMAT_ES_HTTPget_columns()andfill_all_columns() = trueFE — Connector SPI extension
ScanNodePropertiesResultclass: typed wrapper carrying both scanproperties and a
notPushedConjunctIndicesset with explicithasConjunctTrackingflag (distinguishes "no tracking" from "all pushed")getScanNodePropertiesResult()toConnectorScanPlanProviderFE — ES connector adaptation
EsScanRange: changed toFILE_SCANrange type, addedgetPath(),getTableFormatType(),getFileFormat(), BE-compatible property keysEsScanPlanProvider: builds scan node properties with query_dsl,doc_values_mode, auth, docvalue/fields context serialization
EsConnectorMetadata: addedgetColumnHandles()usingNamedColumnHandleFE — PluginDrivenScanNode enhancement
es_http→FORMAT_ES_HTTP)setEsParams()/setEsScanLevelParams()for ES Thrift fieldsScanNodePropertiesResult(removes pushed-downfilters including
esquery()fake function)FE — Cleanup
PluginDrivenEsScanNode.java(390 lines)PhysicalPlanTranslator: removed ES-specific branch, allplugin-driven connectors now go through
PluginDrivenScanNodeRelease note
ES catalog scans now use the unified FILE_SCAN execution path shared with
JDBC and other plugin-driven connectors. This is an internal architectural
change — query behavior and results are unchanged. The
esquery()functionand doc-value optimization continue to work as before.
Check List (For Author)