feat!: id-based outer-reference resolution (Substrait 0.89.0)#982
Draft
nielspardon wants to merge 2 commits into
Draft
feat!: id-based outer-reference resolution (Substrait 0.89.0)#982nielspardon wants to merge 2 commits into
nielspardon wants to merge 2 commits into
Conversation
Update the substrait submodule to v0.89.0 and model its new id-based outer-reference mechanism alongside the existing offset-based one: - RelCommon.rel_anchor -> Rel.getRelAnchor() / Rel.withRelAnchor(...) - OuterReference.rel_reference -> FieldReference.outerReferenceRelReference() Both proto converters handle the OuterReference oneof (steps_out and rel_reference) in each direction, preferring the id-based form when producing so a reference carrying both encodings serializes as rel_reference. Add OuterReferenceConverter to translate a plan between the two encodings (toIdBased / toStepsOut). Assigning plan-wide unique rel_anchors requires whole-plan context, so this conversion lives in core rather than being duplicated per integration. BREAKING CHANGE: Rel gains an abstract getRelAnchor(); non-Immutables Rel implementations must implement it. withRelAnchor(...) is added as a throwing default method that the generated Immutables override. Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Wrap isthmus conversion with the core OuterReferenceConverter: - Producing: convert offset-based outer references (steps_out) emitted by the depth-based visitor into the id-based form (rel_anchor / rel_reference). - Consuming: normalize id-based plans back to steps_out before the existing depth-based correlation logic runs, so both encodings are accepted while that logic stays unchanged. Offset-based plans pass through untouched. Calcite's correlation model is itself id-based (CorrelationId), so this aligns isthmus' external interface with the direction Substrait is migrating towards without touching its correlation wiring. SubstraitRepeatRel, a non-Immutables test Rel, now implements the new Rel.getRelAnchor() accessor. Signed-off-by: Niels Pardon <par@zurich.ibm.com>
08eae97 to
4e98b39
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Updates the
substraitsubmodule to v0.89.0 and adds support for its new id-based outer-reference resolution (RelCommon.rel_anchor+OuterReference.rel_reference) alongside the existing offset-basedsteps_outmechanism, then migrates the isthmus integration onto it.Implements #869.
Core (
feat(core)!)Rel.getRelAnchor()/Rel.withRelAnchor(...)andFieldReference.outerReferenceRelReference().OuterReferenceoneof in both directions. When a reference carries both encodings, the producer prefers the id-basedrel_reference(the two share a protobufoneof, so only one is emitted); existing offset-only producers keep emittingsteps_outand stay readable by older consumers, per the breaking change policy.OuterReferenceConvertertranslates a plan between the two encodings (toIdBased/toStepsOut). Assigning plan-wide-uniquerel_anchors requires whole-plan context, so this lives in core rather than being duplicated per integration.Isthmus (
feat(isthmus))Calcite's correlation model is itself id-based (
CorrelationId), so isthmus now exposes id-based references at its boundaries by wrapping conversion with the core converter, leaving its depth-based correlation wiring untouched:steps_outoutput is converted to id-based (rel_anchor/rel_reference).steps_out, so both encodings are accepted and the existing logic is unchanged. Offset-based plans pass through untouched.Breaking change
Relgains an abstractgetRelAnchor(); non-ImmutablesRelimplementations must implement it.withRelAnchor(...)is added as a throwing default method that the generated Immutables override.Known limitation
OuterReferenceConvertersupports outer references whose binding relation is the input of a single-input host (Filter/Project), which covers correlated scalar/IN/EXISTS subqueries. A correlated reference binding directly into a multi-input (join/set) condition scope, or a shared subtree (ReferenceRel), throwsUnsupportedOperationException. Isthmus does not produce those shapes today; extending the converter to multi-input scopes is a possible follow-up.Testing
rel_anchor, andOuterReferenceConverter(round-trip identity, nesting, shared-scope dedup, unsupported cases).SubqueryPlanTest(produce now id-based) and added an id-based consume test toSubqueryConversionTest.:core:test,:isthmus:test(incl. TPC-H / TPC-DS correlated-subquery round-trips),spotlessCheck, and Spark/examples compilation all pass.🤖 Generated with AI