Fix aggregation flow with remote initiator#1872
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e72ebfe177
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (getClusterName(context).empty() // Not cluster request | ||
| && !(context->getSettingsRef()[Setting::object_storage_remote_initiator] // Not request with remote initiator | ||
| && !context->getSettingsRef()[Setting::object_storage_remote_initiator_cluster].value.empty())) |
There was a problem hiding this comment.
Keep unsupported storages at FetchColumns
When object_storage_remote_initiator is set for a configuration that does not support cluster mode, such as icebergLocal or other local data-lake storages where getClusterName is empty because isClusterSupported is false, this new exception makes getQueryProcessingStage report WithMergeableState. IStorageCluster::read still immediately falls back to pure_storage for those configurations, and StorageObjectStorage::read ignores processed_stage, so aggregate queries can be planned as if partial aggregation happened remotely even though only raw rows were read. Please only return the distributed stage when the subsequent read path will actually use the remote/cluster execution path.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 437a47dc4a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| is_remote = settings[Setting::object_storage_remote_initiator] | ||
| && (!settings[Setting::object_storage_cluster].value.empty() | ||
| || !settings[Setting::object_storage_remote_initiator_cluster].value.empty()); |
There was a problem hiding this comment.
Preserve validation for missing remote initiator cluster
When object_storage_remote_initiator=1 is set on the fallback s3/object-storage table functions but both object_storage_cluster and object_storage_remote_initiator_cluster are empty, this now makes is_remote false and routes the query through BaseSimple. That bypasses the existing IStorageCluster::read validation that raises BAD_ARGUMENTS for this invalid configuration, so a misconfigured query silently runs locally instead of failing or using a remote initiator.
Useful? React with 👍 / 👎.
437a47d to
0878f8f
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0878f8f10d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| is_remote = settings[Setting::object_storage_remote_initiator] | ||
| && (!settings[Setting::object_storage_cluster].value.empty() | ||
| || !settings[Setting::object_storage_remote_initiator_cluster].value.empty()); |
There was a problem hiding this comment.
Preserve validation for bare remote initiator requests
When object_storage_remote_initiator=1 is set on the fallback s3/object-storage table function without object_storage_cluster or object_storage_remote_initiator_cluster, this new guard leaves is_remote false, so executeImpl builds the simple storage and bypasses the validation added in StorageObjectStorageCluster/IStorageCluster. That makes an invalid remote-initiator request run locally instead of raising the documented exception, which is easy to hit when the companion cluster setting is omitted or misspelled.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
object_storage_remote_initiator_cluster without cluster name or object_storage_remote_initiator_cluster call an exception before.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix aggregation flow with remote initiator
Documentation entry for user-facing changes
With
object_storage_remote_initiatorbut withoutobject_storage_clustersettingStorageObjectStorageCluster::getQueryProcessingStagereturnedQueryProcessingStage::Enum::FetchColumns, as result nodes sent all rows on initiator and aggregation executed on initiator.Now method returns
QueryProcessingStage::Enum::WithMergeableStateis proper cases, and pre-aggregation executed on nodes.CI/CD Options
Exclude tests:
Regression jobs to run: