sql: collapse FROM-less correlated existence subqueries to a filter#37442
Draft
antiguru wants to merge 1 commit into
Draft
sql: collapse FROM-less correlated existence subqueries to a filter#37442antiguru wants to merge 1 commit into
antiguru wants to merge 1 commit into
Conversation
0ef6d98 to
75f4564
Compare
A FROM-less correlated existence subquery is a pure predicate on the outer row, but decorrelation lowers it into a semijoin or antijoin that the MIR transforms do not collapse back to a filter. This leaves avoidable joins in the plan for the shapes reported in database-issues#2613 (`1 IN (SELECT 1 WHERE p)`) and database-issues#2969 (`NOT EXISTS (SELECT 1 WHERE p)`). Add an HIR simplification pass, `simplify_from_less_existence_subqueries`, that runs after `try_simplify_quantified_comparisons` (which already normalizes both `IN` and `EXISTS` shapes to an `Exists` node). When the `Exists` body is a FROM-less correlated chain of `Map`/`Project`/`Filter` over a single-row constant, it rewrites `EXISTS(chain, preds)` to `(preds) IS TRUE`, inlining inner column references and shifting outer ones down one level as the predicate leaves the subquery. The `IS TRUE` wrapper is load-bearing for null safety. `NOT EXISTS` keeps the outer row when the subquery is empty, which is when the predicate is FALSE or NULL. `NOT ((p) IS TRUE)` = `p IS NOT TRUE` is true for both, matching. A plain `NOT p` would drop the NULL row. The pass is gated behind the `enable_simplify_from_less_existence` feature flag, default off in production and on in CI and tests. The guard fires only on the FROM-less correlated pure-existence shape, so subqueries with a FROM clause remain genuine anti/semi-joins. Tests: subquery.slt covers both issues with EXPLAIN before/after in the flag-off and flag-on states, plus an explicit NULL-row test for the antijoin rewrite. not-null-propagation.slt is unchanged with the flag defaulting off. Fixes database-issues#2613 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
75f4564 to
6df045a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A FROM-less correlated existence subquery is a pure predicate on the outer row.
Decorrelation lowers it into a semijoin or antijoin that the MIR transforms do not collapse back to a filter.
This leaves avoidable joins in the plan for the shapes reported in database-issues#2613 (
1 IN (SELECT 1 WHERE p)) and database-issues#2969 (NOT EXISTS (SELECT 1 WHERE p)).Add an HIR simplification pass,
simplify_from_less_existence_subqueries, that runs aftertry_simplify_quantified_comparisons(which already normalizes bothINandEXISTSshapes to anExistsnode).When the
Existsbody is a FROM-less correlated chain ofMap/Project/Filterover a single-row constant, it rewritesEXISTS(chain, preds)to(preds) IS TRUE, inlining inner column references and shifting outer ones down one level as the predicate leaves the subquery.The
IS TRUEwrapper is load-bearing for null safety.NOT EXISTSkeeps the outer row when the subquery is empty, which is when the predicate is FALSE or NULL.NOT ((p) IS TRUE)=p IS NOT TRUEis true for both, matching.A plain
NOT pwould drop the NULL row.The guard fires only on the FROM-less correlated pure-existence shape, so subqueries with a FROM clause remain genuine anti/semi-joins.
The pass is gated behind the
enable_simplify_from_less_existencefeature flag, default off in production and on in CI and tests.Tests:
subquery.sltcovers both issues with EXPLAIN before/after in the flag-off and flag-on states, plus an explicit NULL-row test for the antijoin rewrite.Fixes database-issues#2613
Motivation
Fixes a plan-quality gap: FROM-less correlated existence subqueries leave avoidable semi/antijoins.
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.