Skip to content

feat!: id-based outer-reference resolution (Substrait 0.89.0)#982

Draft
nielspardon wants to merge 2 commits into
substrait-io:mainfrom
nielspardon:feat/id-based-outer-references
Draft

feat!: id-based outer-reference resolution (Substrait 0.89.0)#982
nielspardon wants to merge 2 commits into
substrait-io:mainfrom
nielspardon:feat/id-based-outer-references

Conversation

@nielspardon

Copy link
Copy Markdown
Member

Summary

Updates the substrait submodule to v0.89.0 and adds support for its new id-based outer-reference resolution (RelCommon.rel_anchor + OuterReference.rel_reference) alongside the existing offset-based steps_out mechanism, then migrates the isthmus integration onto it.

Implements #869.

Core (feat(core)!)

  • Model: Rel.getRelAnchor() / Rel.withRelAnchor(...) and FieldReference.outerReferenceRelReference().
  • Proto converters handle the OuterReference oneof in both directions. When a reference carries both encodings, the producer prefers the id-based rel_reference (the two share a protobuf oneof, so only one is emitted); existing offset-only producers keep emitting steps_out and stay readable by older consumers, per the breaking change policy.
  • OuterReferenceConverter translates a plan between the two encodings (toIdBased / toStepsOut). Assigning plan-wide-unique rel_anchors requires whole-plan context, so this lives in core rather than being duplicated per integration.

Isthmus (feat(isthmus))

Calcite's correlation model is itself id-based (CorrelationId), so isthmus now exposes id-based references at its boundaries by wrapping conversion with the core converter, leaving its depth-based correlation wiring untouched:

  • Produce: steps_out output is converted to id-based (rel_anchor / rel_reference).
  • Consume: incoming plans are normalized back to steps_out, so both encodings are accepted and the existing logic is unchanged. Offset-based plans pass through untouched.

Breaking change

Rel gains an abstract getRelAnchor(); non-Immutables Rel implementations must implement it. withRelAnchor(...) is added as a throwing default method that the generated Immutables override.

Known limitation

OuterReferenceConverter supports outer references whose binding relation is the input of a single-input host (Filter/Project), which covers correlated scalar/IN/EXISTS subqueries. A correlated reference binding directly into a multi-input (join/set) condition scope, or a shared subtree (ReferenceRel), throws UnsupportedOperationException. Isthmus does not produce those shapes today; extending the converter to multi-input scopes is a possible follow-up.

Testing

  • New core round-trip tests for both outer-reference encodings, rel_anchor, and OuterReferenceConverter (round-trip identity, nesting, shared-scope dedup, unsupported cases).
  • Updated isthmus SubqueryPlanTest (produce now id-based) and added an id-based consume test to SubqueryConversionTest.
  • :core:test, :isthmus:test (incl. TPC-H / TPC-DS correlated-subquery round-trips), spotlessCheck, and Spark/examples compilation all pass.

🤖 Generated with AI

Update the substrait submodule to v0.89.0 and model its new id-based
outer-reference mechanism alongside the existing offset-based one:

- RelCommon.rel_anchor  -> Rel.getRelAnchor() / Rel.withRelAnchor(...)
- OuterReference.rel_reference -> FieldReference.outerReferenceRelReference()

Both proto converters handle the OuterReference oneof (steps_out and
rel_reference) in each direction, preferring the id-based form when
producing so a reference carrying both encodings serializes as
rel_reference.

Add OuterReferenceConverter to translate a plan between the two encodings
(toIdBased / toStepsOut). Assigning plan-wide unique rel_anchors requires
whole-plan context, so this conversion lives in core rather than being
duplicated per integration.

BREAKING CHANGE: Rel gains an abstract getRelAnchor(); non-Immutables Rel
implementations must implement it. withRelAnchor(...) is added as a
throwing default method that the generated Immutables override.

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Wrap isthmus conversion with the core OuterReferenceConverter:

- Producing: convert offset-based outer references (steps_out) emitted by
  the depth-based visitor into the id-based form (rel_anchor /
  rel_reference).
- Consuming: normalize id-based plans back to steps_out before the
  existing depth-based correlation logic runs, so both encodings are
  accepted while that logic stays unchanged. Offset-based plans pass
  through untouched.

Calcite's correlation model is itself id-based (CorrelationId), so this
aligns isthmus' external interface with the direction Substrait is
migrating towards without touching its correlation wiring.

SubstraitRepeatRel, a non-Immutables test Rel, now implements the new
Rel.getRelAnchor() accessor.

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
@nielspardon nielspardon force-pushed the feat/id-based-outer-references branch from 08eae97 to 4e98b39 Compare July 2, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant